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Traditional  time  series  analysis  focuses  on  finding  the  optimal  model  to  fit 
the  data  in  a learning  period  and  using  this  model  to  make  predictions  in  a future 
period.  However,  many  practical  applications,  such  as  earthquake  time  series  or 
epileptic  brain  electroencephalogram  (EEG)  time  series,  may  only  contain  a few 
meaningful,  or  predictable  patterns,  which  can  be  used  for  meaningful  forecasting 
such  as  the  occurrences  of  some  specific  events  following  similar  patterns.  In  these 
cases,  the  traditional  time  series  model  such  as  the  autoregressive  {AR)  model 
usually  gives  poor  predictions  since  the  model  is  constructed  to  fit  the  entire 
learning  period,  while  the  pattern  useful  for  prediction  may  occur  during  only  a 
small  portion  of  the  period. 

The  purpose  of  this  research  is  to  provide  a statistical  algorithm  to  identify 
the  most  predictable  pattern  in  a given  time  series  and  to  apply  this  pattern  to 
make  predictions. 

In  this  dissertation,  we  propose  the  Pattern  Match  Signal  Identification 
(PMSI)  algorithm  to  identify  the  most  predictable  pattern  in  a given  time  series. 

In  this  algorithm,  the  concept  of  the  pattern  match  is  used  instead  of  the  generally 
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used  value  match  criterion.  The  most  predictable  pattern  is  then  identified  by 
the  significance  of  a test  statistic.  The  feasibility  of  this  algorithm  is  proved 
analytically  and  is  confirmed  by  simulation  studies.  An  epileptic  brain  EEG 
time  series  and  the  well  known  Wolf’s  monthly  sunspot  time  series  are  used  as 
applications  of  this  algorithm. 

A forecasting  method  based  on  the  identified  pattern  by  the  PMSI  algorithm 
is  introduced.  Multivariate  regression  models  are  applied  to  subsequences  in  the 
learning  period  with  the  most  predictable  patterns,  and  these  regression  equations 
are  used  to  make  predictions  in  a future  period.  The  performance  of  this  method  is 
compared  with  the  one  by  the  autoregressive  {AR)  models.  The  two  applications 
(EEG  and  sunspot  time  series)  show  that  the  proposed  forecasting  method  gives 
significantly  better  predictions  than  AR  models,  especially  for  more  step  ahead 
predictions. 
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CHAPTER  1 
INTRODUCTION 


1.1  Research  Scope  in  Time  Series  Analysis 

Time  series  analysis  is  concerned  with  data  which  are  not  independent, 
but  serially  correlated.  Usually,  the  relations  between  successive  observations 
are  used  for  predicting  the  future  behavior.  The  basic  idea  of  such  analysis  is  to 
determine  what  information  from  the  past  should  be  chosen  to  make  predictions. 
Therefore,  a modest  aim  of  a time  series  analysis  is  to  derive  a good  description 
from  a learning  period.  This  description  may  simply  consist  of  some  statistical 
summaries  or  graphical  representations,  but  it  can  also  be  used  to  forecast  future 
values  of  the  series.  The  use  of  the  available  observations  at  time  t to  forecast  its 
future  value  at  time  t + r can  provide  a basis  for  economics  and  business  planning, 
production  planning,  inventory  and  production  control,  or  control  and  optimization 
of  industrial  processes.  Many  methods  in  time  series  analysis  have  been  developed 
for  this  purpose. 

The  concept  of  stationarity  is  central  to  understand  the  probabilistic 
structure  of  a time  series.  The  idea  of  stationarity  is  that  the  probabilistic 
structure  of  the  time  series  is  not  affected  by  a shift  in  time  origin.  Although  most 
of  the  statistical  forecasting  methods  are  developed  for  stationary  time  series, 
many  time  series  in  industry,  business  and  economics  are  often  nonstationary  and, 
in  particular,  have  no  natural  mean.  However,  a nonstationary  time  series  may 
still  contain  portions  with  stationary  properties.  An  approach  for  searching  the 
stationary  parts  in  a nonstationary  time  series  is  the  focus  of  this  research. 
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1.2  Motivation  and  Purpose  of  Research 

Most  of  the  methods  for  time  series  analysis  try  to  identify  the  optimal 
model  to  fit  the  data  in  a learning  period  and  apply  this  model  to  predict 
the  future.  However,  many  time  series  may  only  contain  a few  meaningful,  or 
predictable  patterns,  especially  in  a nonstationary  time  series  such  as  earthquake 
time  series  and  electroencephalogram  (EEG)  where  the  interest  may  lie  on  the 
occurrences  of  some  specific  events.  For  example,  when  similar  patterns  appear 
repeatedly  in  the  observations,  it  may  appear  in  the  future.  In  these  cases,  the 
traditional  time  series  model  such  as  the  autoregressive  (AR)  model  usually  gives 
poor  predictions  since  the  model  was  constructed  to  fit  the  entire  learning  period 
where  a stable  pattern  useful  for  prediction  may  occur  only  in  a small  portion.  The 
importance  of  finding  the  consistent  patterns  and  applying  them  to  make  better 
predictions  for  such  time  series  motivated  this  dissertation. 

Hence,  for  a given  time  series,  the  purpose  of  this  research  is  to  answer 
the  following  two  important  questions:  (1)  What  are  the  predictable  signals  in 
a given  time  series  and  how  stable  are  they?  and  (2)  how  good  axe  these  signals 
in  making  the  future  predictions?  Motivated  by  the  concept  of  value  match 
criterion  in  calculating  the  approximate  entropy  (ApEn),  two  statistical  measures 
to  quantify  the  regularity  of  a time  series  are  proposed.  An  algorithm  based  on 
the  idea  of  pattern  match  criterion  to  examine  the  highly  repeatable  pattern  of 
signals  and  select  the  most  predictable  patterns  in  a given  time  series  by  statistical 
tests  is  constructed.  The  idea  is  that,  if  this  pattern  consistently  appears  in  the 
future,  we  should  have  a better  predictions  for  its  proceeding  values.  The  pattern 
identifiability  by  the  proposed  algorithm  will  be  verified  analytically  and  tested  by 
simulation  studies.  An  epileptic  EEG  time  series  and  Wolf’s  monthly  sunspot  time 
series  will  be  used  as  applications  in  this  dissertation. 
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1.3  Outline 

The  remainder  of  this  dissertation  is  organized  as  follows.  Chapter  2 
reviews  the  statistical  literature  concerning  approximate  entropy  analysis  and  some 
established  methods  for  nonstationary  time  series  analysis  and  models  for  analyzing 
Wolf’s  sunspot  time  series.  Chapter  3 introduces  the  concept  of  pattern  match 
criterion.  Two  new  regularity  scores  based  on  this  criterion  are  proposed.  Their 
performance  and  consistency  are  then  compared  with  the  ApEn  by  an  example 
of  well  known  MIX(p)  process.  In  Chapter  4 we  present  a PMSI  algorithm  for 
identifying  the  most  predictable  pattern  of  signals  in  a time  series  based  on  the 
concept  of  pattern  match  criterion  and  statistical  tests.  Analytical  proof  for  the 
pattern  identifiability  by  this  algorithm  and  its  applications  are  provided.  In 
Chapter  5 we  propose  a forecasting  method  based  on  the  results  of  the  PMSI 
algorithm  and  the  multivariate  regression  analysis.  The  forecasting  performance  of 
this  method  is  compared  with  autoregressive  {AR)  models  in  two  applications.  The 
conclusion  of  this  dissertation  and  a discussion  of  future  research  are  given  in  final 
Chapter  6. 


CHAPTER  2 
LITERATURE  REVIEW 


2.1  Quantification  of  Regularity  for  Time  Series:  Approximate  Entropy  (ApEn) 
Recently,  approximate  entropy  (ApEn)  (Pincus  1991,  Pincus  and  Huang 
1992),  a statistic  for  quantifying  the  system  regularity/complexity  of  a time  series, 
has  been  widely  applied  on  medical  data  analysis.  It  can  be  used  to  distinguish 
normal  from  abnormal  data  in  instances  where  moment  statistics  (e.g.,  mean 
and  variance)  approaches  fail  to  show  meaningful  difference,  such  as  in  heart  rate 
analysis  in  the  human  neonate  (Pincus  and  Viscarello  1992,  Pincus,  Cummins  and 
Haddad  1993),  and  in  epileptic  activity  analysis  in  electrocardiographic  recordings 
(Diambra,  Bastos  de  Figueiredo  and  Malta  1999).  Mathematically,  as  part  of  a 
general  theoretical  development,  ApEn  has  been  shown  to  be  the  rate  of  entropy 
for  an  approximating  Markov  Chain  to  a process  (Pincus  1992).  Most  importantly, 
compared  with  the  Kolmogorov-Sinai  (K-S)  entropy  (Kolmogorov  1958),  ApEn 
is  generally  finite  and  has  been  shown  that,  based  on  the  calculations  that 
included  theoretical  analyses  of  both  stochastic  and  deterministic  chaotic 
processes  (Pincus  1991,  Pincus  and  Keefe  1992)  and  clinical  applications  (Pincus, 
Gladstone  and  Ehrenkranz  1991,  Kaplan,  Furman,  Pincus,  Ryan,  Lipsitz  and 
Goldberger  1991),  it  can  classify  the  complexity  of  the  systems  by  as  few  as  1000 
data  points. 

The  main  steps  of  ApEn  analysis  are  described  as  follows: 

1.  Give  a time  series  U = {ui,  U2, . . . , it„},  measured  equally  spaced  in  time. 
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2.  Fix  I,  an  integer,  and  r,  a positive  real  number.  The  value  of  I is  the  length  of 
compared  subsequences  in  U,  and  r specifies  a tolerance  level.  The  choices  of 
I and  r may  vary  for  different  application  of  ApEn. 

3.  Define  ®,-  = {uj,  . . . , Form  a sequence  of  vectors  Xi,  X2,  ■■  ■,  ®n-m 

in  RK  These  vectors  are  subsequences  with  length  I in  U. 

4.  Use  the  sequences  Xi,  X2, . . . , Xn-i+i  to  construct,  for  each  i,  l<i<n  — l + l, 

. number  of  x/s  such  that  d{xi,Xi)  < r 
CAr)  = ; — — , where 

d{xi,  Xj)  = max  | m+k  ~ Uj+k  \ , i-e., 

0<k<l-l 

d{xi,Xj)  represents  the  maximum  distance  between  vectors  X{  and  Xj  in  their 
respective  scalar  components. 

5.  Define 

n— /+! 
i=l 

6.  The  approximate  entropy  is  then  defined  by 

ApEn  = $^(r)  — 


7.  Note  that 


-ApEn  = (r)  — (r) 


EL~;inCH‘(r) 
n — I 


EL-rMnCj(r) 

n — I + 1 


n 


^2(lnC«(r)-lnCi(r)) 


i=l 


1 

n — I 


n—l 


E>” 


C!(r)  ’ 
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and 


C!+'M 

_ Pr{\uj+k-Ui+k\  <r,k  = 0,1,. 

..,0 

Ci(r) 

Pr{\uj^k  - «i+fc|  < r.  A:  = 0, 1, . . . 

,/-l) 

= Pr{\uj+i  - Ui+i\  < r I \uj+k  - «t+fc|  < r,  A:  = 0, 1, . . . ,Z  - 1) 

Heuristically,  ApEn  quantifies  the  (logarithmic)  likelihood  that  subsequences 
in  U of  patterns  that  are  close  and  remain  close  on  the  next  increment.  Lower 
ApEn  value  indicates  that  the  given  time  series  is  more  regular,  or  more  correlated, 
and  larger  ApEn  value  means  more  complex,  or  more  independent.  However,  ApEn 
can  only  be  used  to  quantify  the  overall  regularity  in  a time  series.  It  cannot  be 
applied  to  detect  particularly  meaningful  signals  or  patterns  in  a time  series  or  to 
make  predictions.  Next  section  reviews  several  methodologies  for  the  analysis  and 
forecasting  in  nonstationary  time  series  data. 

2.2  Analysis  and  Forecasting  in  Nonstationary  Time  Series 

Most  of  the  stochastic  models  in  time  series  analysis  are  developed  for 
stationary  time  series.  Autogressive  (AR)  and  the  autoregressive  moving  average 
(ARMA)  models  have  been  the  standard  and  efficient  ways.  In  this  section,  we  will 
focus  on  the  recently  developed  methods  to  deal  with  nonstationary  time  series. 

2.2.1  ARIMA  Models 

Many  methods  have  been  used  to  analyze  and  forecast  nonstationary 
time  series  data.  One  of  the  well  known  methods,  proposed  by  Box  and  Jenkins 
(1976),  is  called  autoregressive  integrated  moving  average  (ARIMA)  process.  It 
is  an  extension  to  the  stationary  autoregressive  moving  average  (ARMA)  process. 
ARIMA  deals  with  nonstationary  time  series  by  using  the  difference  to  make 
the  nonstationary  time  series  a stationary  ARMA  process. 
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We  say  that  {t/t}  is  an  ARIMA  process  of  order  p,d,q,  i.e.,  yt  ~ 

ARIMA(p,  d,  g),  if  the  difference  of  yt  is  a stationary,  invertible  ARMA 
process  of  order  p and  q.  The  model  is  written  as 

t(B)(l  - bYv,  = e(B)j„ 

where  B is  the  backward  shift  operator,  {zt}  is  white  noise,  and  $(•)  and  ©(•) 
are  polynomials  of  degree  p and  q,  respectively,  with  all  roots  of  the  polynomial 
equations  $(2:)  = 0 and  0(2:)  = 0 outside  the  unit  circle. 

Autoregressive  integrated  moving  average  (ARIMA)  processes  provide 
a justification  for  differencing  a nonstationary  time  series  in  order  to  achieve 
stationary,  and  the  Box  and  Jenkins  approax:h  gives  a systematic  way  to  model 
a given  nonstationary  time  series.  However,  this  procedure  can  only  be  used  if 
the  nonstationarity  is  homogeneous,  which  means  that  the  same  finite  number 
of  differencing  applied  everywhere.  Furthermore,  even  if  differencing  produces 
a stationary  process,  it  is  not  guaranteed  that  the  forecasts  obtained  from  an 
estimated  ARIMA  model  are  optimal;  other  transformations  may  yield  better 
forecasts. 

2.2.2  Trend  Removing  Process 

Gardenfors  and  Hansson  (1980)  gave  a methodological  discussion  of  other 
ways  of  transforming  a nonstationary  time  series.  They  argued  that,  in  many  cases, 
removing  trends  is  better  than  differencing  in  many  aspects,  for  example,  the  time 
series  with  seasonal  patterns.  Three  cases  are  discussed  in  their  paper:  (1)  The 
real  process  is  ARMA  plus  a linear  trend,  (2)  the  real  process  is  ARIMA,  and 
(3)  the  real  process  is  unknown.  In  the  first  case,  when  comparing  two  methods 
of  transforming  a nonstationary  series  into  a stationaxy  one,  the  trend  removing 
procedure  is  much  better  than  the  differencing  method.  For  the  second  case,  they 
concluded  that  even  if  differencing  is  the  only  method  to  make  the  series  an  ARMA 
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process,  there  are  some  cases  where  first  removing  a trend  and  then  estimating 
an  ARMA  process  is  better.  When  the  process  is  unknown,  it  is  not  possible  to 
argue  which  procedure  is  better.  The  conclusion  is  that  if  there  is  evidence  that 
a deterministic  trend  is  in  the  time  series,  it  will  generally  give  better  results  if 
such  a trend  is  first  estimated  and  removed  from  the  given  series  before  applying 
the  Box  and  Jenkins  approach  to  the  residual  series  than  if  the  series  is  differenced 
before  the  approach  is  applied.  If  there  is  no  reason  to  believe  the  existence  of  a 
trend  component,  the  differencing  transformation  may  give  a good  result,  but  if 
another  transformation  method  gives  a more  reasonable  interpretation  and  better 
forecasting  properties,  then  this  method  should  also  be  considered. 

2.2.3  ARARMA  Models 

Parzen  (1982)  proposed  an  ARARMA  model  to  fit  the  nonstationary  time 
series.  The  model  consists  of  a nonstationary  AR  followed  by  a stationary  ARMA 
model.  The  Box  and  Jenkins  ARIMA  approach  is  a sepcial  case  of  ARARMA 
process  in  which  the  transformation  is  constrained  to  be  pure  differencing 
operators.  The  model  can  be  described  as  follows: 

Vt  = yt-  H <t>ryt-T, 

P 9 

= ^^PkZt-k,  where  Zt  is  white  noise. 

3=1  fc=0 

The  model  was  compared  with  Box  and  Jenkins  for  the  well  known  international 
air  lines  data  with  respect  to  the  forecasting  mean  square  error  (MSE)  and  mean 
average  percentage  error  (MAPE).  The  results  showed  that  the  ARARMA  model 
is  superior  to  the  one  recommended  by  Box  and  Jenkins,  especially  for  the  further, 
i.e.,  more  step  ahead,  forecasts. 
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2.2.4  Kalman  Filter 

Another  well  known  procedure  is  using  the  state  space  filtering  algorithm 
(Kalman  1960)  and  was  discussed  by  Meinhold  and  Singpurwalla  (1983)  in  some 
statistical  aspect.  The  basic  idea  of  this  approach  was  motivated  by  the  updating 
feature,  and  its  derivation  followed  via  the  least  squares  estimation  theory.  In  this 
method,  an  observation  equation  and  a system  equation  are  used.  Let  yi,y2,  ■ ■ ■ ,Vt 
be  the  observed  time  series  and  yt  is  assumed  to  depend  on  an  unobservable 
quantity  Xt,  known  as  state  of  nature.  The  relationship  between  yt  and  Xt  is  linear 
and  is  specified  by  the  observation  equation  as 

yt  = HtXt  + bt,  « = 1,2,... 

where  Ht  is  a known  quantity  and  bt  is  the  observation  error  which  is  assumed  to 
be  normally  distributed  with  mean  zero  and  a known  finite  variance. 

The  main  difference  between  Kalman’s  state  space  filtering  method  and  the 
traditional  linear  model  is  that  the  state  of  nature  (xt)  in  Kalman’s  method,  which 
corresponds  to  the  regression  coefficients  of  the  linear  models,  is  not  assumed  to  be 
a constant  but  may  change  in  time.  This  dynamical  feature  of  Xt  is  represented  by 
a system  equation  as 

Xt  — — 1 t — 1, 2, . . . 

where  Ft  is  a known  quantity  and  Oj  is  the  system  equation  error  which  is  assumed 
to  be  normally  distributed  with  mean  zero  and  a known  finite  variance. 

In  this  algorithm,  there  are  two  major  assumptions:  (1)  Ht  and  Ft  have 
to  be  specified,  (2)  the  assumptions  about  the  statistical  characteristics  of  the 
dynamical  error  terms,  {bt}  and  {at}  have  to  be  made  such  as  normality  and 
independence.  However,  in  many  applications,  the  validity  of  these  assumptions  is 
not  checked. 
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2.2.5  Dynamical  Stochastic  Approximation  Method 

Rao  et  al.  (1992)  proposed  a two-step  approximation  procedure  to  predict 
a nonstationary  time  series  called  the  dynamical  stochastic  approximation  (SA) 
method.  The  first  step  of  this  method  is  to  make  a correction  for  the  time- vary 
trend  according  to  the  observed  data,  and  the  second  step  is  to  predict  the  process 
by  using  a stochastic  approximation  method  based  on  the  new  observations.  The 
algorithm  of  this  method  is  described  as  follows: 

1.  Given  a time  series  yi,y2,  • ■ • ,Vt,  the  prediction  for  yt+i  is  estimated  by  using 
only  the  historical  data  yt,  yt-i,  • • • , J/i-  The  observed  value  of  yt  is  assumed  to 
consist  of  a deterministic  component  9t  and  a random  component  Zt  as  of  the 
form 

yt  = + Zt,  where  {^t}  is  a sequence  of  white  noise. 

2.  The  deterministic  component  is  estimated  by 

= (1  + 1 ^)9t  + 0{t  ^),y>  > 1, 

where  O(-)  is  a term  indicating  the  order  of  magnitude  (Dupac  1965). 

3.  An  example  of  9t  could  be:  9t  = a\t  + uq,  where  ao  and  ai  are  constants  which 
may  be  estimated  by  fitting  the  linear  regression  model  on  the  historical  data. 
After  obtaining  oq  and  oi,  one  can  write:  9t-^.l  = 9t  + ai. 

4.  Let  yt+i\t  be  the  predicted  value  of  yt+i  given  data  yt,  yt-i,  ■ • ■ , 2/i,  and  yt+i\t-i 
be  the  predicted  value  of  yt+i  given  data  yt-i, . . . ,yi,  then  the  two-step 
dynamic  stochastic  approximation  method  is 

step  1 Predict  yt+i  given  . . . , yi  by  yt+i\t-i  = yt\t-i  + Oi- 
step  2 Use  yt  to  adjust  the  prediction  of  yt+i\t  by 


yt+i\t  — yt+i\t-i  + 7t(2/t  ~ yt+i|t-i)) 
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where  jt  is  a sequence  of  positive  numbers  satisfying  three  conditions: 

t t 

(i)  lim  7t  = 0;  {ii)  V 7^  = oo;  {in)  V 7?  < 00. 

t-*oo  ^ ^ ' 

»=1  »=1 

The  interpretation  of  this  method  is  that,  at  time  t + 1,  we  try  to  obtain 
an  estimate  yt+i\t  for  the  value  of  9t+i  which  is  the  predicted  value  for 
yt+i-  The  first  step  is  to  maJce  a correction  for  the  time- varying  trend 
and  the  second  step  is  to  apply  an  ordinary  stochastic  approximation 
procedure  by  using  the  current  observation  yt.  The  sequence  {74}  is 
usually  chosen  to  minimize  the  mean  square  error  E[{yt\t~i  — OtY]  (Chien 


and  Fu  1969)  as 


Tt*  = 


(1+^ 

{t  + l)(t  + 2){2t  + 3)/6  + {a^/vf  - 1) ’ 


where  > var(zt)  and  vl  is  the  MSE  of  the  initial  estimate,  i.e.. 


vl  = E[{y,^o  - 9i)^ 


A modified  dynamic  SA  algorithm  was  also  proposed  by  Rao  et  al.  (1992), 
which  keeps  the  first  step,  but  changes  the  second  step  by 

yt+i\t  = yt+i\t-i  + oct{yt  — yt+i\t-i),  where 

= ll,  0L2  = 72.  = 7g(t),  i > 3 

t 

g{t)  = 2 + '^f[{yi\i^i  - yi\i-2){yi-i\i-2  - yi-l\i-^),  and 

i=3 

{1  if  X < 0 
0 if  X > 0 

In  this  method,  at  depends  on  the  observed  and  predicted  yt  values.  Therefore,  the 
forecasts  adapt  themselves  to  the  time  history  of  the  data.  Consequently,  they  are 
more  sensitive  to  variations  in  the  data  than  7^  which  depends  mainly  on  t. 
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In  conclusion,  stochastic  approximation  method  could  be  useful  in  predicting 
the  nonstationary  time  series  which  is  fluctuate  around  the  trend  line  such  as 
climatic  time  series.  Comparing  with  the  moving  average  (MA)  process,  which  is 
also  used  to  track  trends  in  the  time  series,  the  SA  algorithm  has  all  the  advantages 
of  MA  process  without  the  disadvantages  such  as  subjective  selection  of  order. 


2.2.6  Adaptive  Forecasting  Method 

Grillenzoni  (1998)  developed  an  adaptive  forecasting  method  for  a 
nonstationary  time  series  based  on  the  optimization  of  recursive  estimators.  This 
method  is  useful  for  forecasting  time  series  with  trends  and  cycles  whose  pattern 
changes  over  time.  The  idea  is  that  the  parameters  in  the  models  are  estimated  by 
means  of  a recursively  optimal  algorithm.  For  example,  if  the  series  is  an  AR(p) 
process,  as 


yt  = + zt,  where  (f>2t,  • • • , <l>pt]  and  x\  = [yt-i, . . . , yt- 


pj) 


then  is  estimated  by 


= Pt-i  + attXt[yt  - 00  = 00  and 

•p  1 -p  . T f'  r 

r*  — yr*-!  - /^[i  , 'p — + Ti-fp)  To  - 7o/p 

In  this  recursive  algorithm,  0 < (A, /x)  < 1,  and  0 < (a,  71)  < 00  are  adaptation 
coefficients  and  {0q,'Yo)  are  initial  values.  The  choices  of  these  coefficients  are 
such  that  the  loss  function  based  on  the  prediction  errors  is  optimized;  i.e.,  choose 
[d,  A,/t,7i,/9o,7o]  to  minimize 


N 

Qn  = ^ ] \yt  ~ 

t=p+i 

This  algorithm  was  compared  with  the  constant  parameter  models  through  the 
applications  on  airlines  and  IBM  stock  time  series  data  sets.  The  results  showed 
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that  the  recursive  estimator  models  axe  better  than  the  constant  parameter  models 
either  in  fitting  or  in  forecasting. 

2.2.7  Whittle  Pseudo-Maximum  Likelihood  Estimation 

Velasco  and  Robinson  (2000)  proposed  a Whittle  pseudo-maximum 
likelihood  estimation  for  nonstationary  time  series.  They  argued  that  this  method 
can  be  applied  to  any  degree  of  nonstationarity  d > 0.5,  where  d is  the  memory 
parameter,  without  a priori  knowledge  of  the  memory  length  of  the  series.  The 
method  estimates  the  parameters  in  a discrete  frequency-domain  Whittle  objective 
function,  Qn{0),  by  minimizing  Qnifi),  where 


The  Yhjij,)  ^ sum  over  j = p,2p, . . . ,{n  — p),  assuming  that  n/p  is  an  integer,  and 
P{Xj)  is  denoted  as  the  tapered  periodogram  with  a taper  of  order  p. 

This  estimation  was  compared  with  the  time-domain  maximum  likelihood 
estimates  with  respect  to  the  bias  and  the  standard  error.  The  results  of  the 
simulated  ARIMA  process  with  different  values  of  d showed  that  the  parameter 
estimation  and  its  standard  error  by  the  Whittle  pseudo-maximum  likelihood 
procedure  are  more  consistent  than  the  ones  by  the  time-domain  maximum 
likelihood  method. 


In  1848,  Rudolf  Wolf,  a Swiss  astronomer,  introduced  a method  of 
enumerating  the  sunspot  activity  in  a systematic  way.  Prom  the  daily  data, 
monthly  data  has  been  obtained  as  far  back  as  1749.  However,  the  yearly  sunspot 
data  set  was  mainly  analyzed  in  the  statistical  literature. 

The  problem  of  modeling  Wolf’s  sunspot  time  series  has  intrigued  many 
investigators  in  the  last  few  decades.  Yule  (1927)  analyzed  this  series  by  an 


2.3  Analysis  of  Wolf’s  Sunspot  Time  Series 
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autoregressive  process.  Moran  (1954),  Schaerf  (1964),  Bailey  (1965),  Phadke  and 
Wu  (1974),  Box  and  Jenkins  (1976),  and  Morris  (1977)  applied  autoregressive  or 
autoregressive-moving  average  process  to  model  this  time  series.  The  reason  why 
the  ARMA  models  are  attractive  for  analyzing  this  data  is  the  disturbed  periodic 
behavior  in  this  series.  Woodward  and  Gray  (1978)  used  the  R-  and  S-array 
method  of  model  identification  (Gray,  Kelley  and  Mclntire  1978)  to  obtain  new 
ARMA(8,1)  models.  These  models  were  based  on  the  classical  176  point  data 
set  for  the  years  1749  ~ 1924.  Prom  the  results  of  model  identification  by  Gray, 
Kelley,  and  Mclntire,  they  suggested  that  the  series  should  be  a nonstationary  or 
nearly  nonstationary  ARMA(8,1).  After  employing  the  Gray,  Kelley,  and  Mclntire 
procedure  to  estimate  the  parameters  of  the  model,  the  resulting  model  was 
obtained  as 

(1  - 1.64B  + B^)(l  - 0.29B'‘  - 0.215®  - 0.195®)^*  = (1  - QMB)zt, 

where  Xt  = yt  — y,  yt  is  the  observed  sunspot  numbers,  B'^Xt  = Xt-k,  and  Zt  is  white 
noise.  Another  model  was  also  obtained  on  the  basis  of  examination  of  the  models 
which  have  roots  close  to  the  unit  circle,  i.e.,  nearly  nonstationary  models.  The 
resulting  series  was  again  modelled  as 

(1  - 1MB  + 0.9452)(i  _ 0.18B  - 0.025^  - O.OIB^  - 0.265^  - O.ISS^  - 0.16B®)it  = (1  - 0.58B)zf 

These  two  models  were  compared  with  the  ARMA (2,0)  model  proposed 
by  Yule  (1927),  Box  and  Jenkins  (1976),  which  is  the  most  widely  known  ARMA 
model  for  the  sunspot  data  analysis.  They  claimed  that  the  new  ARMA  models 
provide  marked  improvement  over  the  ARMA(2,0)  model  with  respect  to  (1) 
the  similarity  between  sample  correlation  functions  from  the  models  and  from 
the  original  sunspot  data,  (2)  forecast  performance,  (3)  comparison  of  spectral 
densities,  and  (4)  comparison  of  generated  realizations  with  the  sunspot  series. 
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Lomb  and  Andersen  (1980)  proposed  seasonally  stationary  autoregressive 
models  to  make  forecasts  for  the  current  sunspot  cycle  by  assuming  that  the 
sunspot  cycle  is  changing.  They  showed  that  these  models  considerably  improve 
over  other  autoregressive  models  by  comparison  of  forecasts.  The  approach  is  an 
extension  of  the  method  proposed  by  McNish  and  Lincoln  (1949),  as 

R^  = Rn-\-  = ^ + A:„_i  AiZ„_i, 

where  is  the  value  to  be  predicted  in  a particular  cycle  for  the  year  after 
a local  minimum;  Rn  is  the  average  value  of  all  the  values  in  the  preceeding 
cycles;  ^Rn-\  is  the  difference  of  Rn-i  and  Rn-i,  and  k„_i  can  be  different  for 
each  n.  They  then  define  Xij  to  be  the  observation  of  the  cycle,  and  a series 
of  linear  models  is  constructed  by  relating  Xij  to  and 

Two  sets  of  models  were  calculated,  one  using  i measured  from  the  local  minimum, 
the  other  from  the  local  maximum  of  a cycle.  They  concluded  that  the  forecasts 
produced  with  cycles  beginning  at  a maximum  are  sequentially  superior  to  both 
McNish-Lincoln  and  Box-Jenkins  forecasts. 


CHAPTER  3 

PATTERN  MATCH  REGULARITY  SCORES:  A COMPARISON  WITH 

APPROXIMATE  ENTROPY 


3.1  Concept  of  Pattern  Match 

For  a given  time  series  data,  it  is  always  important  to  know  how  reg- 
ular/complex this  time  series  is.  In  this  chapter,  motivated  by  the  algorithm 
for  calculating  approximate  entropy  (ApEn)  (Pincus  1991),  two  new  scores  for 
quantifying  the  regularity  of  any  given  time  series  are  introduced. 

Recall  that  the  calculation  of  ApEn  is  based  on  a conditional  probability 
of  the  next  corresponding  points  being  value  matched  given  that  the  previous  I 
corresponding  points  are  all  value  matched,  for  a fixed  integer  1.  That  is,  for  any 
subsequence  of  length  I in  the  given  time  series  U,  estimate 

Pr{  difference  of  the  next  points  of  Xi  and  Xj  < r 
I Xi  and  Xj  are  value  matched  }, 

where  value  match  is  defined  as  the  maximum  difference  between  the  corresponding 
points  of  two  subsequences  is  less  than  a critical  value  r.  One  main  disadvantage 
of  ApEn  is  that,  for  a given  time  series  with  no  further  information,  it  is  almost 
impossible  to  know  how  to  choose  its  parameters,  the  length  of  subsequences  (/)  and 
the  filtering  critical  value  (r).  For  different  selections  of  parameters  I and  r,  ApEn 
could  give  very  inconsistent  results,  even  when  the  choices  are  all  in  the  reasonable 
range.  One  reason  of  this  inconsistency  could  be  because  this  value  match  criterion 
is  very  sensitive  to  the  critical  value  r and  hence  the  degree  of  difficulty  to  satisfy 
the  value  match  criterion  increases  fast  even  a small  increase  of  1.  Besides,  even 
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when  two  subsequences  are  value  matched  to  each  other,  they  may  have  very 
different  patterns,  which  could  be  considered  a meaningless  match  in  practice.  For 
example,  when  the  noise  portion  in  a time  series  has  much  samller  variation  than 
the  other  meaningful  portion,  the  value  match  criterion  may  have  more  matches 
from  the  subsequences  in  the  noise  part.  For  these  reasons,  we  propose  to  use  the 
criterion  of  pattern  match  instead  of  value  match  to  evaluate  the  regularity  of  a 
time  series.  The  following  definition  defines  the  criterion  of  pattern  match. 

Def.  3.1:  Suppose  that  U = {u\,U2,  ■ ■ ■ , Un}  is  the  time  series  to  be  investigated, 
and  let  be  the  sample  standard  deviation  of  U.  For  a fixed  integer  /, 
define  a series  of  subsequences  of  U such  that 

Xi  = {ui,Ui+i, 1 < i < n - / + 1. 

Then  for  a given  positive  real  number  r (e.g.,  r = 0.2d„),  Xi  and  Xj  axe 
considered  pattern  I -matched  to  each  other  if: 

1.  |itj  — Uj\  < r,  < r,  and 

2.  for  A:  = 1, ...,/-  1,  sign(itj+fc  - Ui+k-i)  = sign(uj+fe  - Uj+k-i)- 

The  first  condition  of  this  criterion  means  that  the  beginning  points  and  the  end 
points  of  two  subsequences  must  be  valued  matched;  i.e.,  two  subsequences  have 
to  be  approximately  at  the  same  range.  The  second  condition  indicates  that 
the  points  in  between  must  have  the  same  pattern,  i.e.,  pattern  matched  to  each 
other.  Obviously,  this  matching  scheme  decreases  the  degree  of  dependence  on  the 
parameters  I and  r since  the  exact  value  match  for  the  corresponding  points  of 
the  two  subsequences  is  not  required  as  for  calculating  ApEn.  Figure  3.1  shows 
an  example  of  this  advantage  of  pattern  match.  Clearly,  Xi  and  Xj  have  very 
similar  structure  to  each  other  and  will  be  accepted  by  the  pattern  match  criterion 
according  to  its  definition.  But  if  considering  the  values  for  each  corresponding 
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points,  because  the  fourth  and  the  fifth  corresponding  points  of  Xi  and  Xj  are  not 
matched  very  well,  they  may  not  accepted  by  the  value  match  criterion  depending 
on  how  large  the  critical  vale  r is.  In  the  next  section,  we  will  introduce  two 
new  scores  to  quantify  the  regularity  of  a time  series  based  on  this  pattern  match 
scheme. 


Figure  3.1:  An  example  of  good  pattern  match  but  not  value  match. 

3.2  Pattern  Match  Regularity  Scores 
To  investigate  how  regular/complex  a given  time  series  is,  or  to  compare 
the  degree  of  randomness  among  several  time  series,  one  usually  applies  a score 
to  quantify  the  regularity  of  time  series.  In  this  section,  based  on  the  definition 
of  pattern  l-match,  we  propose  two  new  regularity  scores.  The  algorithm  for 
calculating  the  first  regularity  score  (Scorel)  is  very  similar  to  the  algorithm  for 
calculating  ApEn,  but  changes  the  value  match  criterion  to  pattern  match  scheme  of 
Def.  3.1. 
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Let  U = {u\,U2, . . . , Un}  be  the  time  series  to  be  investigated.  For  given 
a fixed  integer  I,  form  a series  of  subsequences  Xi,  X2,...,  x^^i  of  length  I in 
where 

X{  — {Uj,  . . . , , 1 ^ Z ^ 71  1. 

Similar  to  ApEn,  when  comparing  a subsequence  Xj  with  all  subsequences  Xj’s 
in  U,  the  idea  of  Scorel  is  based  on  the  conditional  probability  that  the  next 
points  of  Xi  and  Xj  have  the  same  change  of  sign,  i.e.,  sign{ui+i  — = 

sign{uj^i  — given  that  x,-  and  Xj  are  pattern  l-matched  to  each  other.  That 

is,  for  each  subsequence  x<  of  length  /,  define 

Pi  = Pr{sign{ui+i  — Uj+j-i)  = sign{uj+i  — | Xj  pattern  l-matched  with  Xj} 


_ Pr{xj  pattern  l-matched  with  x<,  sign{ui+i  — = sign{uj^i  — 

Pr{xj  pattern  l-matched  with  Xj} 

where  Xj  is  any  subsequence  of  length  1. 

Then,  by  the  given  time  series  U of  n points,  for  1 < z < n — we  can 
estimate  p,-  by  using  the  subsequences  Xi,  X2, . . . , x„_j  in  17  as 


Pi  = 


of  Xj ’s  pattern  l-matched  with  Xj  and  sign{ui+i  — Uj+j-i)  = sign{uj+i  — Uj+j-i) 


tJ  ofxj ’s  pattern  l-matched  with  x< 
Then  the  first  regularity  score  can  be  written  as 


1 

Scorel  = - 

Th  V 


t=l 


Intuitively,  when  the  given  time  series  is  more  regular,  pi  should  be  larger 
and  therefore  as  for  ApEn,  Scorel  should  be  decreasing  when  the  given  time  series 
is  more  regular/less  complex. 

The  second  new  regularity  score  (Score2)  is  also  based  on  the  idea  of  pattern 


l-match,  but  for  each  subsequence  Xj  in  the  time  series  U,  estimates  only  the 
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probability  that  a subsequence  Xj  of  length  I is  pattern  l-matched  with  x,-.  That  is, 
estimate 

Pi  = Pr{xj  pattern  l-matched  with  Xi} 


by 

^ Jt  of  Xj ’s  pattern  l-matched  with  X; 

Pi  — TT~i 

n — t + 1 

Then  the  second  regularity  score  is  defined  as 

n-J+l 

Score2  = ; In m 

n-l-\-l  ^ 

1 = 1 

Score2  basically  is  using  the  first  part  of  Scorel,  i.e.,  probability  of  pattern 
l-match,  therefore,  its  calculation  is  more  efficient  than  Scorel  or  ApEn  since  the 
estimation  for  the  probabilities  of  the  next  points  being  pattern  matched  is  not 
required.  Further,  since  a more  regular  time  series  should  have  more  occurrences  of 
pattern  l-match;  i.e.,  the  average  of  — Inpj  over  i should  be  smaller,  this  regularity 
score  should  have  the  same  characteristic  as  Scorel  and  ApEn.  That  is,  Score2 
should  be  decreasing  with  the  higher  degree  of  regularity. 

In  the  next  section,  we  will  use  an  example  to  compare  these  two  new 
regularity  scores  with  the  widely  used  ApEn. 

3.3  Example:  A Comparison  With  ApEn 

To  compare  the  two  new  regularity  scores  with  ApEn,  we  applied  all  three 
scores  on  a MIX(p)  process  which  was  introduced  by  Pincus  (1991)  to  illustrate 
the  regularity  in  correlated  stochastic  processes  with  a continuous  state  space.  The 
construction  of  MIX(p)  process  is  as  follows: 


1.  Fix  0 < p < 1,  define  Oj  = \/2sin(27rj7l2),  for  all  j = 1, 2, . . . 

2.  Let  bj  be  a family  of  independent  identically  distributed  (i.i.d.)  real  random 
variables,  with  uniform  density  on  the  interval  [— \/3,  \/3]. 
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3.  Let  Cj  be  a family  of  i.i.d.  random  variables,  cj  = 1 with  probability  p and 
Cj  = 0 with  probability  I — p. 

4.  Define  MIX(p)j  = (1  — Cj)aj  + Cjbj 

This  is  a family  of  stochastic  processes  that  samples  a sine  wave  when  p = 0 and 
samples  i.i.d.  uniform  random  variables  when  p = 1.  Figure  3.2  shows  200  points  of 
the  simulated  MIX(p)  processes  when  p = 0,0.25,0.50,0.75, 1.  Obviously,  MIX(p) 
process  becomes  more  random  as  p increases.  Furthermore,  the  MIX(p)  process 
has  mean  zero  and  variance  one  for  all  p,  which  means  that  the  means  and  the 
variances  cannot  be  used  to  distinguish  the  degree  of  regularity /randomness  for 
different  values  of  p. 

There  are  two  purposes  of  this  comparison.  First  is  to  compare  the  sensi- 
tivity with  respect  to  the  change  of  the  randomness  parameter  p.  Theoretically, 
the  three  compared  regularity  scores  should  all  increase  when  p becomes  larger. 

The  second  purpose  is  to  compare  the  consistency  of  the  scores  with  respect  to  the 
different  parameters  {I  and  r)  in  the  algorithms.  To  accomplish  these  two  purposes, 
curves  (over  p,p  = 0.00, 0.01, 0.02, . . . , 0.99, 1.00)  of  each  score  are  generated  for 
different  values  of  /(=  2,3, 4,5)  and  r(=  0.14,0.16,0.18,0.20,0.22).  Since  the  curves 
may  have  the  different  scales  and  we  are  only  interested  in  how  the  scores  change 
over  p,  for  the  purpose  of  comparison,  the  ranges  of  all  curves  are  normalized  from 
0 to  1.  Each  value  in  the  curves  is  based  on  an  average  of  10  values  from  Monte 
Carlo  simulation,  and  the  sample  size  for  each  simulation  is  1000  points. 

Figures  3.3,  3.4  and  3.5  show  the  curves  of  ApEn,  Scorel,  and  Score2 
values  over  p,  where  the  length  of  subsequences  I = 2,3, 4, 5,  and  the  filtering 
critical  value  r = 0.18  as  Pincus  (1991)  used  except  only  I = 2 was  shown  in  his 
paper.  He  claimed  that  the  degree  of  regularity  can  be  distinguished  by  ApEn  for 
0 < p < 0.70  when  I = 2.  But  clearly  when  different  values  of  I are  applied,  the 
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Figure  3.2:  MIX(p)  process  for  different  values  of  p. 
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results  are  very  unreasonably  inconsistent  (Figure  3.3).  The  results  for  regularity 
Scorel  are  much  more  consistent  than  ApEn  for  different  values  of  1.  However, 
it  can  only  distinguish  the  degree  of  regularity  when  p values  are  from  0 to  0.50 
(Figure  3.4).  As  for  the  regularity  Score2,  again,  the  results  for  different  values  of  I 
are  very  consistent  (Figure  3.5).  Furthermore,  the  ability  to  distinguish  the  degrees 
of  regularity  is  almost  the  same  as  in  the  best  case  of  ApEn  {1  = 2). 

Figures  3.6,  3.7  and  3.8  show  the  curves  of  the  three  scores  (ApEn,  Scorel 
and  Score2,  respectively)  for  different  values  of  r(=  0.14,0.16,0.18,0.20,0.22) 
but  fix  f = 3.  Again,  ApEn  shows  inconsistent  results  for  different  value  of  r. 

The  results  from  both  Scorel  and  Score2  are  very  consistent  because  the  pattern 
match  criterion  are  less  dependent  on  the  the  critical  value  r as  mentioned  earlier. 
Besides,  both  proposed  regularity  scores  distinguish  the  degrees  of  randomness 
better  than  ApEn. 
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P 


Figure  3.3:  Normalized  ApEn  versus  p in  the  MIX(p)  process  for  different  values  of 
I,  given  r = 0.18. 
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P 


Figure  3.4:  Normalized  Scorel  versus  p in  the  MIX(p)  process  for  different  values  of 
I,  given  r = 0.18. 
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P 


Figure  3.5:  Normalized  Score2  versus  p in  the  MIX(j?)  process  for  different  values  of 
/,  given  r = 0.18. 
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P 


Figure  3.6;  Normalized  ApEn  versus  p in  the  MIX(p)  process  for  different  values  of 
r,  given  / = 3. 
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P 


Figure  3.7:  Normalized  Scorel  versus  p in  the  MIX(p)  process  for  different  values  of 
r,  given  / = 3. 
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Figure  3.8:  Normalized  Score2  versus  p in  the  MIX(p)  process  for  different  values  of 
r,  given  / = 3. 
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3.4  Summary  and  Discussion 

In  this  chapter,  we  introduced  the  idea  of  pattern  match  criterion  to 
investigate  the  regularity/complexity  of  a given  time  series.  Based  on  the 
probability  of  two  subsequences  satisfying  this  criterion,  two  new  regularity 
scores  were  proposed  and  were  compared  with  the  widely  applied  Approximate 
Entropy,  ApEn  (Pincus  1991),  which  was  mainly  based  on  the  probability  of  two 
subsequences  matching  a value  match  criterion. 

There  are  two  main  parameters  in  both  pattern  match  and  value  match 
criterions,  one  is  the  length  of  the  subsequences  (/)  to  be  compared,  the  other  is  the 
filtering  critical  value  (r)  used  to  determine  whether  there  is  a match  or  not.  One 
difficulty  for  both  procedures  is  the  selections  of  these  two  parameters.  Without 
any  information  of  the  given  time  series,  it  is  almost  impossible  to  decide  what 
would  be  the  proper  values  for  the  selection  of  I and  r. 

By  a simulation  study  on  a stochastic  MIX(p)  process  (the  same  example 
Pincus  used),  we  compared  three  scores  (ApEn,  Score  1 and  Score2)  on  distin- 
guishing different  degrees  of  randomness  of  MIX(p)  process  for  different  choices 
of  parameters.  It  was  observed  that,  for  the  different  values  of  I and  r,  the  two 
scores  based  on  the  pattern  match  criterion  gave  much  more  consistent  results,  in 
both  I and  r,  than  ApEn  for  distinguishing  the  randomness  of  the  MIX(p)  process. 
In  addition,  both  proposed  scores  can  distinguish  the  process  well,  especially  for 
Score2,  it  consistently  performs  well  and  has  almost  the  same  results  as  in  the 
optimal  parameter  selections  {I  = 2,  r = 0.18)  for  ApEn. 

In  conclusion,  the  proposed  regularity  scores  based  on  the  pattern  match 
criterion  provide  more  reliable  results  to  distinguish  the  randomness  of  times  series 
than  ApEn  in  terms  of  the  uncertain  selections  of  parameters.  In  other  words,  as 
long  as  the  selections  of  I and  r are  in  a reasonable  range,  the  proposed  regularity 
scores  should  be  more  reliable  than  ApEn  for  quantifying  the  regularity/complexity 
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of  a time  series.  In  addition,  as  a byproduct  of  the  pattern-match  procedure,  if  the 
consistently  matched  patterns  can  be  identified  with  statistical  significance,  because 
(1)  the  procedure  is  more  consistent  with  different  settings  of  the  parameters  and  (2) 
the  value  match  criterion  is  more  likely  to  identify  the  patterns  in  the  noise  part  of 
the  time  series,  these  patterns  could  be  more  useful  in  practice  for  understanding 
the  characteristics  of  the  given  time  series  and  making  better  future  predictions 
than  the  ones  by  the  value  match  criterion. 


CHAPTER  4 

PATTERN  MATCH  SIGNAL  IDENTIFICATION  (PMSI)  ALGORITHM 


4. 1 Introduction 

Most  of  the  methods  for  time  series  analyses  try  to  find  the  best  model 
to  fit  the  data  in  a learning  period  and  apply  this  model  to  predict  the  values 
in  the  future  series.  However,  many  time  series  only  contain  a few  meaningful 
or  predictable  patterns  and  the  proportion  of  the  noise  part  in  the  time  series  is 
high.  In  such  cases,  the  predictions  over  time  may  not  be  of  interested.  The  more 
important  task  could  be  the  identification  of  the  most  predictable  (or  consistent) 
patterns  in  a given  time  series  and  the  use  of  the  identifiable  patterns  for  making 
reliable  predictions  to  the  future.  In  other  words,  the  forecasting  in  these  cases  is 
mainly  interested  in  the  occurrences  of  some  specific  events  which  follow  a pattern 
of  signals  being  observed.  Since  the  majority  of  the  time  series  is  unpredictable 
noise,  the  traditional  time  series  models  such  as  autoregressive  (AR)  models  which 
are  constructed  to  fit  over  the  entire  time  series  will  fail  to  accomplish  this  task. 

Take  the  stock  market  as  an  example.  If  an  investor  is  interested  in 
investing  on  the  stocks  of  a company,  the  most  important  question  he/she  would 
like  to  ask  is  when  should  be  the  best  time  to  buy  or  sell  the  stocks.  We  know  that, 
for  most  of  the  time,  the  price  of  a stock  is  unpredictably  up  and  down.  Therefore, 
if  we  fit  a model  on  the  entire  learning  period,  it  will  not  help  the  investor  to 
make  decisions  when  to  buy  or  sell  the  stocks.  However,  if  we  can  apply  some 
statistical  method  to  identify  the  most  predictable  pattern  of  this  stock  price  in 
a learning  period,  it  should  be  possible  that  this  pattern  will  remain  at  least  in 
a near  future  time  period  so  that  the  investor  will  have  a higher  chance  to  earn 
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and  less  chance  to  lose.  For  example,  in  a learning  period,  if  the  most  consistent 
pattern  is  identified  such  that  after  the  stock  price  goes  down  consecutive  five  days, 
it  will  be  going  up  next  three  days,  the  investor  should  have  better  chance  to  earn 
money  in  the  future  if  he/she  follows  this  pattern  before  he/she  buy  the  stocks. 

The  same  idea  can  also  be  applied  on  some  medical  data  (e.g.,  heart  rate  time 
series  and  epileptic  electroencephalogram  (EEG)  time  series)  or  earth  science  data 
(e.g.,  earthquake  time  series).  We  should  expect  that  this  method  works  better  in 
the  latter  time  series  than  in  stocks. 

In  this  chapter,  we  will  introduce  a Pattern  Match  Signal  Identification 
(PMSI)  algorithm  to  identify  the  most  predictable/consistent  pattern  of  signals 
in  a given  time  series.  First  we  will  describe  the  PMSI  algorithm  in  section  4.2. 

In  section  4.3,  we  will  give  an  analytic  proof  for  the  pattern  identifiability  by 
the  algorithm  for  stationary  AR  time  series  and  a special  case  of  nonstationary 
time  series,  random  walk.  Results  of  some  applications  of  the  algorithm  on  the 
simulation  data  and  real  time  series  (epileptic  EEG  time  series  and  Wolf’s  monthly 
sunspot  time  series)  will  be  given  in  section  4.4.  Summary  and  discussion  is  in  the 
final  section  4.5. 

4.2  Pattern  Match  Signal  Identification  (PMSI)  Algorithm 
The  PMSI  algorithm  provides  a statistical  method  to  identify  the  most 
predictable  pattern  (of  length  1)  in  a given  time  series.  In  this  section,  we  will 
describe  the  concept  of  this  algorithm. 

For  a given  time  series  U = {iti,  U2, . . . , form  a series  of  length  I 
subsequences  in  U,  Xi,  X2,...,  JCn-j+ii  where 

Xi  = {Uj,  Uj+i, . . . , 1 < i < n - / + 1. 

Let  I = li  + I2,  and  for  any  two  subsequences  X{  and  Xj,  similar  to  Def  3.1,  we  have 
following  pattern-match  definitions. 
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Def.  4.1  Xi  and  Xj  are  considered  pattern  li-matched  if 

|ui  — Uj\  < r and  < r, 

sign{ui+k  ~ = sign{uj+k  - Uj^k-i),  for  fc  = 1, 2, . . . , Zi  - 1. 

Def.  4.2  Xi  and  Xj  are  considered  pattern  l^-matched  if 

sign{uij^i^j^k  — = sign(uj^i^^k  ~ 

for  A:  = 0, 1, . . . ,/2  — 1- 

Def.  4.3  Xi  and  Xj  are  considered  pattern  edge-matched  if 

sign{ui^i^—i_  Ui^i^—2)  — sign{uj^i^—\  Uj+jj_2)- 

Def.  4.4  Xi  and  Xj  are  considered  pattern  I' -matched  if  and  Xj  are  pattern 
li-matched  and  pattern  l^-matched,  i.e., 

|ui  - Wjl  < r and  - Uj+h-i|  < r, 

sign{ui+k  ~ «i+fc-i)  = sign{uj+k  ~ Uj-\-k-i),  for  fc  = 1, 2, 1. 

Def.  4.5  Xi  and  Xj  are  considered  pattern  {I2  + 1)* -matched  if  Xi  and  Xj  are 
pattern  edge-matched  and  pattern  l^-matched,  i.e., 

sign{ui^l^+k-l  ~ Ui+h+k-2)  = sign{uj^i^^k-i  ~ Uj+h+k-2), 

for  A:  = 0, 1, . . . ,/2- 

The  PMSI  algorithm  is  a statistical  test  procedure  based  on  the  probabilities 
of  these  pattern  matches.  Also,  we  can  see  that  the  endpoints  value  matches  are 
only  required  when  the  pattern  matches  include  the  first  li  points  (/i-matching 
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part),  and  clearly, 

{pattern  I’-match}  C {pattern  li-match}  C {pattern  edge-match} 

{pattern  V-match}  C {pattern  {I2  + \)* -match}  C {pattern  edge-match} 
{pattern  I’-match}  C {pattern  l^-match} 

A graph  representation  of  examples  for  these  pattern  *-matches  is  shown  in  figure 
4.1. 

Now,  using  subsequences  ®i,  X2, . . . , Xn-i+i,  for  each  i,  1 < i < n - / + 1,  let 
Pu  = Pi'{xj  pattern  l^-matched  with  Xi  | Xj  pattern  li-matched  with  X{} 

_ Pr{xj  pattern  V -matched  with  x^} 

Pr{xj  pattern  li-matched  with  x<} 

and 

P2i  = Pr{xj  pattern  l^-matched  with  Xj  | Xj  pattern  edge-matched  with  Xj} 

_ Pr{xj  pattern  {I2  + 1)* -matched  with  Xj} 

Pr{xj  pattern  edge-matched  with  x<} 

Then,  for  each  i,  1 < i < n — / + 1,  we  test  the  hypotheses: 

: Pit  = P2i 

versus  Hu  : pu  > P2i, 

For  each  i,l  < i < n - I -\- 1,  define 

Vu  = tl  of  ’s  pattern  V -matched  with  Xj 

'O'U  = tt  of  Xj ’s  pattern  li-matched  with  Xj 

V2i  = tt  0/  pattern  {I2  + 1)* -matched  with  Xj 

^2i  = tt  0/  Xj ’s  pattern  edge-matched  with  X{ 
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Figure  4.1:  Examples  (Z  = 7,h  = 4,  Z2  = 3)  for  pattern  ^-matches. 
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and  let 


Vu 

Pu  = 

5 

nu 

II 

CN 

— , and 

ri2i 

Vu  + V2i 

Pi  = 

nu  + U2i 

Then,  the  test  statistic  for  testing  i7o«  versus  Hu  is 

Pxi  P2i 

y/pii^-Pi)^  + :^,) 

Under  the  null  hypothesis  /fo»  : pu  = p^u  TSi  ~ N{Q,  1).  Then  the  most 
predictable  pattern  can  be  identified  as  the  pattern  of  the  subsequence  Xi  such  that 
T Si  has  the  smallest  p- value  of  all  i. 

There  are  a few  points  we  should  mention  about.  First,  for  identifying 
the  signals  of  length  I,  the  possible  number  of  patterns  is  which  means  that 
there  are  many  subsequences  of  length  I having  the  same  patterns.  We  use  i as 
the  index  of  subsequences  instead  of  the  index  of  patterns  because  in  practice, 
we  may  give  some  constraint  on  the  value  of  subsequences.  For  example,  we  may 
only  be  interested  in  the  subsequences  which  have  the  certain  value  of  range  or 
standard  deviation.  In  that  case,  the  subsequences  which  have  the  same  pattern 
may  have  different  probabilities  of  pattern  match  and  therefore  we  should  test  them 
separately.  But  if  we  do  not  give  any  constraint  on  the  values,  we  may  just  use  i as 
the  index  of  patterns,  which  means  we  will  only  test  the  hypotheses  times.  In 
the  proof  of  the  pattern  identifiability  by  PMSI  algorithm,  we  will  apply  the  more 
general  pattern  match  criterions  without  considering  any  constraint  on  the  values. 

Secondly,  suppose  that  k sets  of  hypotheses  have  been  tested,  and  let 
Pi,P2,  - ■ ■ ,Ph  be  the  p- values  of  these  tests  for  testing  i/ot  : Pu  = P2i  versus 
Hu  : Pu  > p2i,  then  we  say  that  the  most  predictable  pattern  in  this  time  series 
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data  U should  be  from  subsequences  which  has  the  smallest  p- value,  say  p*,  over  all 
k tests.  By  applying  the  Bonferroni  inequality,  we  can  define  a regularity  score  for 
this  pattern  as  — ln(A:  x p*).  The  regularity  score  of  a statistically  meaningful  signal 
or  pattern  should  be  greater  than  -ln(0.05)  = 3.0,  with  significance  level  a = 0.05. 

The  third,  the  validity  of  the  tests  is  based  the  assumptions  that  y-a  and 
j/2i  axe  two  independent  binomial  random  variables  and  the  strong  law  of  large 
numbers  (SLLN)  holds  such  that,  for  large  n, 

Pu  Pii  and  p2i^P2i- 

For  the  first  assumption,  we  know  that  pattern  li-match  is  a very  small 
subset  of  pattern  edge-match  (~  | of  the  subsequences  in  the  time  series)  and 
hence  we  can  assume  that  yu  is  almost  independent  of  p2i>  or  they  are  weakly 
dependent.  For  the  second  assumption,  since  we  know  that  the  elements  of  a 
time  series  are  not  i.i.d.  random  variables,  the  strong  law  of  large  numbers  can 
not  hold  automatically  in  this  case.  In  section  4.3,  we  will  first  check  the  validity 
of  the  strong  law  of  large  numbers  by  proving  Lemma  4.1  and  4.2.  The  pattern 
identifiability  by  PMSI  algorithm  for  autoregressive  (AR)  stationary  time  series  will 
then  be  proved  following  the  lemmas.  In  the  latter  part  of  the  section,  the  pattern 
identifiability  of  a special  case  of  nonstationary  time  series,  random  walk,  will  also 
be  proved. 

4.3  Analytic  Proof  for  the  Pattern  Identifiability  by  PMSI  Algorithm 

Suppose  a binary  time  series  V'-  = {v[^,  is  generated 

from  the  subsequences  Xi,X2,. . . , aJn-/+i  in  a stationary  Ai2(l)  time  series 
V = {ui,  V2,...,  Vn}  and  a pattern  j,  1 < j < 2*“^,  such  that 

\ 1 if  Xi  is  pattern  l-matched  with  a pattern  j 
I 0 if  Xi  is  not  pattern  l-matched  with  a pattern  j 
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Then,  in  the  first  part  of  this  section,  we  will  prove  that  the  strong  law  of  large 
numbers  holds  for  Vj,  1 < j < 2*“^.  That  is,  suppose 


Pj  = Pr{a  subsequence  is  pattern  l-matched  with  pattern  j},  then 


n-l+l  I 


n — I + \ 


Pji 


a.s. 


Lemma  4.1  Suppose  {wt}  is  a stationary  AR(1)  process,  i.e.,  Vt  = avt-i  + e*,  with 
€t  ~ iV(0,(7^)  and  |o!|  < 1.  Without  loss  of  generality,  let  event  A be 

A = {?;<+!  - Vi  > 0,  Vi+2  - Vi+i  > 0, . . . , Vi+i  - Vi+i_i  > 0}, 

i.e.,  A is  an  event  such  that  a subsequence  Xi  = {vj,  Vi+i, . . . , Vi+j_i}  has  a 
monotone  increasing  pattern. 

Similarly,  let  event  B be  an  event  such  that  Xi^k  has  the  same  pattern  as  Xi, 
i.e., 


~ ~ Vj+i;  > 0,  Vi+fc+2  ~ I’l+Jfe+l  > 0,  . . . , Vi^k+l  ~ f^i+k+l-l  > 0}) 

where  0 < P{A),P{B)  < 1.  Then  \cov{Ia,Ib)\  < cf^,  where  c and  p are 
positive  constants  and  p < 1,  and  Ia,Ib  ore  indicator  functions  of  events  A 
and  B. 

Proof: 

We  first  approximate  Vt  by  a finite  state  Markov  process  v^.  There  is 
no  practical  difference  between  Vf  and  Vt  because  we  can  approximate  Vt  to  the 
decimal  in  recording.  In  other  words,  the  value  of  Vt  we  can  realize  in  practice  is  a 
finite  state  Markov  process. 

Let  PaS  be  the  stationary  probabilities  for  the  Markov  process,  and  let  Pafe’s 
be  the  transition  probabilities  such  that 

Pr(y^  = Oo,  = Oi,  . . . , V^^i  = af)  = PaoPaoai  ■ ■ ‘Pai-ian 
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for  each  finite  sequence  Oq,  ax,...,ai  of  states. 

Assume  that  are  all  positive  such  that 

(fc) 

‘^k  = max  1^^^ — 1| 
o.<>  Pb 

is  finite,  where  p^j^’s  are  the  order  transition  probabilities. 

Now,  let  H\  be  a set  of  (/  + 1)— tuples  of  states  and  H2  be  another  set  of 
(Z  + 1)— tuples  of  states  so  that 

A = {(v- , v-^i)  E i/i}  and 

^ ^i*+fc+i)  ■ • ■ ) ^Uk+i)  ^ ^2}- 

Then,  according  to  Billingsley  (1968),  we  have 

lPr(A  nB)-  Pr(A)  • Pr[B)\  < J^PaoPooai  • • -Pa,_ia,  Ipitio  - P60IP6061P6162  • • •P6,_i6„ 

where  the  sum  extends  over  (oq,  oi, . . . , oj)  in  Hx  and  (bo,  61, ... , 6/)  in  ff2-  Since 
YlbPab  — 1 and  by  the  definition  of  i/jk,  again,  according  to  Billingsley,  we  obtain 

\Pr(A  n B)  - Pr(A)  • Pr(B)\  < • Pr(A)  < 

Since  the  transition  matrix  (pab)  is  irreducible  and  aperiodic, 

Pb  for  all  a and  b. 

Since  the  state  space  is  finite,  the  convergence  must  be  exponential  and  uniform  in 
a and  b (Doob  1953)  so  that 

V’fc  — > 0 as  A:  — >•  00, 

i.e.,  there  exists  positive  constants  c and  p,p<  \ such  that 


rpk  = c-  p’^. 
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Hence,  we  have 


cov{IaJb)\  = \Pr{AnB)~  Pr{A)-Pr{B)\ 


< V’fc  = c-P*'- 


□ 


Lemma  4.2  Follow  the  definitions  in  Lemma  4-i,  and  let 


A = {v.'+i  - Vi  > 0,  Vi+2  - ViAi  > 0,  . . . , Vi+i  - Vi+i-i  > 0}. 


Define 


r 

1 if/^  = l 

0 iflA,=0 


Then  the  strong  law  of  large  numbers  (SLLN)  holds  for  {n,-}. 
Proof: 


FYom  a lemma  proved  by  Birkel  (1992),  it  will  be  sufficient  to  show  that 

00  j 

j=i  «=i 

But  by  Lemma  4.1, 

\cov{v'i,Vj)\  = \cov{Ia„Iaj)\  < 
where  c and  p are  positive  constants,  p < 1.  Hence, 


= C-5]Jli^(l  + P + P^  + + 


Eoo 
3=1 


IzA 

1-p 


_c_/Y^oo  1 _ Y^oo  p>  \ 
1— pv2_/j=i  ^ Z—dj—x 
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But  YlT=i  ^ < oo  by  convergence  of  p-series,  and  < oo  since  the  radius  of 

convergence  of  the  power  series  p < 1,  we  have 

OO  j 
j=l  i=l 

i.e.,  Strong  law  of  larger  numbers  (SLLN)  holds  for  {v,'}.  □ 


Since  we  will  apply  the  SLLN  to  the  following  results,  all  convergence  are 
almost  surely  convergence  unless  stated  otherwise.  Now,  we  are  ready  to  prove  the 
pattern  identifiability  by  the  PMSI  algorithm  for  AR{1)  stationary  time  series. 

Theorem  4.1  For  a given  time  series  V = {vi,  where  V is  a stationary 

AR(1)  process,  i.e., 

vt+i  = avt  + et+i, 

|q:|  < 1. 

Suppose  that  after  embedding  signals  S = {si,  S2,  • • • , sj  into  V,  a new  time 
series  U = {ixi, U2,..., u„}  is  observed  as 


Ut  = < 


Vt  if  It  = 0 

Sit  if  ^ 0 
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where  It  6 {0, 1,2,...,/}.  Then  {It}  is  a markov  process  with  transition 
matrix 

^1- Pi  Pi  000.  ..0^ 

0 0 1 0 0 ...  0 

0 0 0 1 0 ...  0 


. . 0 

0 . . .1 

1-P2  P2  0 0 / 

\ / (l+l)x(l+l) 

Define  the  embedding  probability  Pe  = Pr{It  = 1),  and  let  I = /i  + I2,  then 
with  the  proper  range  of  Pe  and  by  the  method  of  pattern  * -match  criterion 
(*  = l',li,l2>^dge)  and  the  comparisons  of  the  conditional  pattern  match 
probabilities,  the  pattern  of  the  embedded  signals  (S)  can  be  identified. 

Proof: 


By  the  definition  of  Pe  and  the  transition  matrix  of  {It}, 

Pe  = Pr{It  = 1) 

= Pr(It  = 1,  Jt_i  = 0)  + Pr(It  = 1,  /t_i  = /) 

= Pr{It  = l|/t_i  = 0)  • Pr(/t_i  = 0)  + Pr(It  = l|/t_i  = /)  • Pr(/*_i  = /) 
= Pi  • (1  - / -Pe) +P2  -Pe 

Pe  = . < 7 (since  1 - P2  > 0). 

1 + / • Pi  - P2  / 

Therefore,  we  know  that  Pe  G (0,  }]. 

For  a given  length  I (length  of  the  embedded  signals),  let  pt{,pt2,  ■ ■ . ,p/x, 
be  the  possible  L{=  patterns  of  length  /,  and  let  p\  be  the  probability 
of  pattern  pt\,  i = 1, 2, . . . , L,  being  pattern  /'-matched  by  a subsequence 
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Xt  = {ut,ut+i,  • • • where  /^  = /j+i  = • • • = = 0,  i.e.,  is  the 

probability  that  pattern  pt\  is  pattern  T-matched  by  a length  I subsequence  in  a 
stationary  AR{1)  time  series. 

Divide  I into  li  and  I2,  and  for  each  i,l  < i < L,  let  pattern  pt[  contain 
sub-patterns  in  the  first  /i  points  and  in  the  last  {I2  + 1)  points.  Then  for 

any  subsequence  Xt  (as  defined  above),  define 

ph  = Pr{xt  pattern  li-matched  with  pattern 
and 

pb  = Pr{xt  pattern  {I2  + 1)* -matched  mth  pattern  ptb}, 

where  pattern  /i-match  and  {I2  + l)*-match  were  defined  in  Def.  4.1  and  Def.  4.5. 
Then,  let 

Pii  = Pr{xt  is  Zg-niatched  with  pt\  \ Xt  is  Zi-matched  with  pt\} 

_ Pr{xt  is  /'-matched  with  pt\} 

Pr{xt  is  /i-matched  with  pt\} 

= Pi/Pi, 

and 

P2i  = Pr{xt  is  /j-matched  with  pt\  \ Xt  is  ed^e-matched  with  pt\} 

_ Pr{xt  is  {I2  + l)*-matched  with  pt\} 

Pr{xt  is  edye-matched  with  pt\} 

= pb/0.5, 
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since  Pr{edge  — match}  = 0.5.  Also,  let 


yu  = number  of  x[s  /'-matched  with  pt\ 

Tiii  = number  of  x[s  /i-matched  with  pt[ 

P2i  = number  of  x[s  (^2  + 1) ‘-matched  with  pt\ 
U2i  = number  of  x[s  ed^e-matched  with  pt\ 


then,  by  the  Strong  Law  of  Large  Number  (SLLN)  proved  in  Lemma  4.2,  for 
sufficiently  large  n,  the  total  number  of  possible  x[s,  we  have 

Pli  — PU, 

nu 

p2i  — —>  Pu, 

U2i 


^ yu  + V2i  Pi  + Pj‘ 

nii  + n2i  p^+0.5' 

Now,  suppose  m(sa  n • Pe)  signals,  with  pattern  ptj  and  its  corresponding  pt^^ 
and  pt^j  sub-patterns,  have  been  embedded,  then  because  of  the  exact  match  of  the 
embedded  signals  with  patterns  ptj,pt\^  and  pt‘f,  for  large  n,  we  have 

yij  ^ m+{n-m)p‘j  = n • [p^  + (1  - Pe)pJ] 

nij m + {n  - m)p^l  = n • [pe -I- (1  - Pe)pJ‘] 

^ ^ Pe  + (1  - Pe)pJ 

nij  + PiJ 


V2j  m + (n  - m)pj"  = n • [p^  + (1  - Pe)p^] 

U2j  m + {n  — m)0.5  = n • [pe  -I-  (1  - Pe)0.5] 


_y^.  Pe  + (1  - Pe)pf  _ * 
naj  Pe  + (1-Pe)0.5 
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and 


Then 


Pj  = 


(Ti  = 


Vlj  + V2j  “^Pe  + - Pe){Pj  + Pj)  ^ * 

nij  + ri2j  2pe  + {l-  p,){p^l  + 0.5)  “ 


\ ^Ij  ^2j 


y/^Y^ 


Pji^-Pj) 


+ 


Pe  + (1  - Pe)Pj  Pe  + (1  - Pe)0.5 


_ Plj  - P2j  Plj  - P*2j  _ 


TSj  = ^ 


(7j  'T*  J 


= (Ti 


Hence,  for  any  other  length  I pattern  pt\  (with  pt'^,pt!^),  i ^ j,  it  will  be 
sufficient  to  find  the  range  of  Pe  such  that 


TSj  > TSi,  for  all  i ^ j, 

i.e.,  for  a proper  range  of  embedding  probability  p^,  the  pattern  of  the  embedded 
signals  can  be  identified  by  the  PMSI  algorithm. 

Now,  for  pattern  pt\  ^ ptj,  there  are  five  cases  should  be  considered: 
case  I:  7^  pt'j,pt'i  ^ pt^j  and  not  edge-matched,  then,  for  large  n, 


Vu  (n  - m)p\ 
nij  ->•  (n  - m)p\^ 


n • (1  - p^)p\ 
n • (1  - pe)p|' 


V\i 


4 


P\i-  > -T  = 


Tin 


Pi 

h =P^i 
Pi 


V2i  (n  - m)pf 
n2i  — ^ (n  — m)0.5 


..  V2i 
P2i  = — 
ri2i 


n-  (1  -pe)pf 

n • (1  - pe)0.5 
0.5 


47 


and 


Pi  = 


_ Vli  + V2i  Pi  + P^i 

^li  + ^2i  p|‘  + 0.5 


= Pi 


(Ti  = 


Then 


2^^.  _ ^ Pii  P2t T’5'* 

ai  0-;  ~ ’ 

case  II:  pt^>  ^ pt‘j,pti^  / but  edge-matched,  then,  for  large  n, 

Vii  -^{n-  m)p\  = n • (1  - pe)p\ 
nii  ^ (n  - m)p^  = n • (1  - p^)p‘^ 

, . Pi  * 

Pii  -j7  = Pii 
Pi 

P2i  ->  (n  - m)pf  = n • (1  - Pe)pf 
n2i  ->  m + (n  - m)0.5  = n • [pe  + (1  - Pe)0.5] 

(1  - Pe)Pi 


and 


P2i 


Pe  + (1  - Pe)0.5 


= P2i 


Pi 


(l-Pe)(Pi+ph  ^ * 
Pe  + (l-Pe)(p!‘+0.5) 


1 


((1  -Pe)p!‘ 


+ 


Pe  + (1  -Pe)0.5 


= CTi 


TSi  ^ = T5; 


Then 
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case  III:  pt^>  = pt^>  (automatically  edge-matched),  pt^^  ^ pt\?,  then,  for  large  n, 

yu  ->{n-  m)p[  = n • (1  - pe)p\ 
nii  ->•  m + (n  - m)p‘^  = n • [pe  + (1  - Pe)Pi] 

{l-Pe)Pi 


Pu 


Pe  + (1  - Pe)p- 


h=Pu 


V2i  (n  - m)pf  = n ■ (1  - pe)pf 

ri2i  ->  m + (n  - m)0.5  = n • [pe  + (1  - Pe)0.5] 

(l-Pe)pf 


P2i 


Pe  + (1  - Pe)0.5 


= P*2i 


and 


Pi 


(Ti 


(1-Pe)(p'+Ph 
2pe  + (l-pe)(p|‘+0.5) 


Pi 


(pe  + (1  -Pe)pi'  Pe  + (1  -Pe)0.5) 


= 


Then 


T5i  ->  = TS: 


case  IV:  pt!^  ^ P^jSp^j^  =P^*  edge  matched,  then,  for  large  n, 

yu  (n-  m)p\  = n • (1  - pe)p\ 
nu  (n  - m)pj'  = n • (1  - p^)p\> 


- ^ Pi 
Pu^—,=  Pu 
Pi 


y2i  (n-  m)pf  = n • (1  - pe)pf 
ri2i (n  - m)0.5  = n-(l-pe)0.5 


- ^ Pi 
P2i^^=P2. 
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and 


Then 


Pi 


+ 0.5 


= (Ti 


TSi  ->  ?l!— ^ = TS: 


case  V:  pt'^  ^ pt^j,pt^^  = pt^j  but  edge  matched,  then,  for  large  n, 


Vii  ->  (n  - m)p\  = n • (1  - pe)p\ 
Tin  ->  (n  - m)p|^  = n • (1  - Pe)pi' 

^ ^ Pi 

=^Pii^-j^=  Pii 
Pi 


and 


Pi 


y2i^m+{n-  m)p’’-  = n • [pe  + (1  - Pe)pf ] 


U2i  ->  m + (n  - m)0.5  = n • [pe  + (1  - Pe)0.5] 
Pe  + (1  - Pe)p|' 


P2i 


Pe  + (1  - Pe)0.5 


P*2i 


Pe  + (l-/pe)(p|+pj^)  ^ * 

Pe  + (1 -Pe)(pi‘  +0.5) 


<7,- 


^y'p.'d  Pi) ((i_p_)pj,  +p.  + (i-p.)o,5) 


= CT. 


Then 


TSi  = T5* 

o*i 


For  patterns  pt|  ^ pt(-,  let 


Tq* 

^max 
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then  for  the  signals  with  pattern  pt'-,  it  will  be  sufficient  find  the  range  of  pe 
satisfying 

Ts;  > 

Prom  the  above  description,  we  know  that  TSj  and  TS^  only  depend 
upon  the  embedding  probability  Pe  and  the  probabilities  of  pattern  match  in 
the  underlying  AR{1)  process,  i.e.,  p|,  p|‘  and  p|",  i = 1, 2, ... , but  it  is 
almost  impossible  to  solve  the  exact  solution  for  the  range  of  Pe  in  the  inequality. 
However,  if  the  underlying  .4/2(1)  process  is  given,  we  can  first  estimate  the  pattern 
matching  probabilities  pi’s,  p|‘’s  and  p|*’s.  Since  the  range  of  Pe  must  be  between 
0 and  1/Z,  the  proper  range  of  Pe  satisfying  the  inequality  for  identifying  the 
embedded  pattern  j can  be  solved  numerically.  In  the  later  part  of  this  section,  the 
numerical  results  for  the  range  of  Pe  with  respect  to  different  length  of  pattern  I 
and  .4/2(1)  coefficient  a will  be  presented.  □ 

Next,  we  will  prove  that  Theorem  4.1  still  holds  for  a special  case  of 
nonstationary  time  series,  random  walk. 


Theorem  4.2  For  a given  time  series  V = {ui,U2, . . where  V is  a AR(1) 
random  walk  process,  i.e., 

Vt+l  = (XVt  + €t+i, 
a = 1. 

Then  applying  the  same  setting  as  in  Theorem  4-1,  with  the  proper  range  of 
embedding  probability  Pe,  the  pattern  of  the  embedded  signals  can  be  identified 
by  PMSI  algorithm. 


Proof: 
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Using  the  same  definitions  as  in  Theorem  4.1  for  and  since  for 
random  walk  time  series, 


Vt+I  -Vt  = €t+i,t=  1,2, ..  . 


which  are  i.i.d.  white  noise,  we  have 


p[=P2  = -'-=Pl  = ^ (L  = 2^-^) 

I2  I2  I2  ^ 

Pi  =P2  ^•••=Pl  = 


2*2  + 1 


That  is,  the  probability  of  matching  any  pattern  (of  same  length)  is  the  same. 
Now,  with  the  same  definitions  as  in  Theorem  4.1  for 


Vlj ) Hlj ) y2j ) > Plj ) p2j ) Pj  31ld  Gj , 

where  pattern  ptl  is  the  same  as  the  pattern  of  embedded  signals,  we  need  to  find 
the  proper  range  of  Pe  such  that  for  alH  ^ j. 


TSj  > TSi,  where  TSi  = 


Pu  -P2i 


Now,  by  the  Strong  Law  of  Large  Number  (SLLN)  which  holds  automati- 
cally in  this  i.i.d.  case,  for  sufficiently  large  n,  for  TSj,  we  have 


Vlj 

nij 

Plj  = 

V2j 

ri2j 


m -H 
m + 


n — m 

n — m 
2*1-1 


yij  rn+^ 


nij  m + 


Plj 


2‘1 


m -I- 


m -h 


n — m 
2*2+1 
n — m 
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and 


^ ^ 

P2j  — > 

U2j  m + 


m+ 


+ ^ _ „* 
n—m  r2j 


Pi  = 


_ Vij  + V2j  2m  + (n  - m)(^  + 

nij  + n2j  2m  + (n  - m)(27^  + I) 


= Pi 


aj 


IPji^-Pj)  + 


V ^2j , 


Pi(l-Pj)  ( ^ 


-Im 


+ m 


Then 


TSj  = ^ ^ -4  ^ = TS* 

&j  0-;  ^ 


For  pattern  pt[  / pt^,  as  in  Theorem  4.1,  the  same  five  cases  should  be 
considered: 

case  I:  pt\^  ^ ^ pt^j  and  not  edge-matched,  then,  for  large  n, 


Vii 

Tlli 


n — m 

n — m 
2h-l 


and 


V2i 

ri2i 


n — m 
2'2+i 
n — m 


then 


- - _ J/l»  V2i 

PU  ~ P2i  — 

nii  ri2t 


-L-i  = o 
2*2  2*2 


which  means  TSi  rs  0 in  this  case. 
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case  II:  ^ pt^j  ,pt^i  ^ pt^j  but  edge-matched,  then,  for  large  n, 


yii 

TTij 


n — m 

“2^ 

n — m 
2<i-i 


and 


then 


J/2i 

n2< 


n — m 


2*2+1 


m + 


n — m 


Pu  — p2i  — > 7^  ~ 


2*2  2*2  + "»2'2+1 


> 0 


Then 


Pi 


(n  - m)(^  + , 

m + (n-m)(^  + i) 

- K)  ( ^ ^ ^ 


T5i  ^ = TS*i 


case  III:  pt^>  = ppy  (automatically  edge-matched),  pt^^  7^  pd^ , then,  for  large  n. 


y\i 


n — m 


2*-i 


nij  — > m + 


n — m 
2^1-1 


and 


V2i 

n2i 


n — m 


2*2+1 


m + 


n — m 


2 


then 
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_ (n  - m)/2'  ^ (n  - m)/2'^+^  ^ 

m + (n  — m)/2‘i  ^ m + (n  — m)/2 

since 

(n-m)/2'-^  _ (n-m)/2'^+^  _ ^ 

(n  — m)/2'»“^  (n  — m)/2 

Hence,  in  this  case,  T5j  < 0. 

case  IV : ^ and  not  edge  matched 

For  this  case,  as  discussed  in  the  proof  of  Theorem  4.1,  it  will  have  the  same 

T5j  as  in  case  I,  i.e.,  TSi  « 0. 

case  V:  = ptj  but  edge  matched,  then,  for  large  n. 


yu 


Tin 


n — m 
2'-i 
n — m 
2'i-i 


and 


P2i 

->■ 

m + 

n — m 

2*2+1 

Tl2i 

m + 

n — m 

2 

then 


Pli  ~ P2i 


(n  — m)/2^  ^ m + (n  — m)/2^*'''^ 
(n  — m)/2'>“^  m + (n  — m)/2 


Hence,  in  this  case,  T^j  < 0. 

Now,  we  will  check  TSj  by  applying  the  following  Lemma. 


Lemma  4.3  f/  fj-  = > where  0 < Oi  < 6i  (or  0 < 02  < 62A  02  = A:  • 01,62  = 

A:  ■ 61,  A:  > 1,  then  for  any  m > 0, 


Oi  + m 


02  + m 


> 0. 


61  + m 62  + m 
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Proof: 


oi  + m 02  + m 
bi  + m b2  + m 


ai&2  + Tnai  + 77162  + — 0261  — mbi  — 77102  — 

(61  + 7T7)(62  + m) 


77i(oi  + 62  — 61  — O2) 
(61  + 7T7)(62  + m) 


(since  O162  = 0261) 


777(01  + kb]_  — by  — ktti) 


(61  + 7T7)(62  + m) 


m[bi{k  — 1)  — Oi(A:  — 1)] 
(61  + m)(62  + m) 


m(k  — l)(6i  — Oi)  , . , , , , 

-Tz r-7z ^ > 0 (since  777  > 0,  K > 1,  and  by  > Oi) 

(6i+  777)(62  + 777)  ^ , , 1 i; 


Hence, 


Oi  + 777  02  + 777 

61+777  62+777 


□ 


Now,  for  TSj, 


Pij  ~ p2j  — 


Vlj  V2j 


77 ij  772j 

+777 

^ + m 


n—m 


n—m 

2 


+ 777 
+ 777  ’ 


Since  for  pt\  in  case  I, 


Pii  - P2i  = 0, 


i.e.. 


Vu 

nyi 


— , where 
ri2i 


• 0 < 2/ii  < 77ii,  0 < yai  < ^21,  and 

• V2i  = • j/i,-,  772i  = • 77 li,  2^*“^  > 1 since  /i  > 3 
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and 


Pij  — 


yu  + m . _ 

Tin  + m’ 


V2i  + m 
ri2i  + m ’ 


m > 0 


then  by  the  lemma  4.3,  we  have 


. ^ _ yu  + m y2i  + m ^ ^ 

Pij  — P2j  — ; ^ > 0 

Tiii  + m Ti2i  +m 


i.e,  for  all  Xj’s  have  the  same  pattern  as  embedded  signals,  as  n sufficiently  large. 


TSj  > 0. 

Hence,  it  will  be  sufficient  to  find  the  proper  range  of  such  that,  for  the 
pattern  ptj  of  the  embedded  signals, 

TSj  > TS^,  where  i is  such  that  pt\  e case  II. 

Again,  it  is  very  difficult  to  find  the  exact  solution  for  the  range  of  pe  in  the 
inequality,  even  it  is  much  simpler  than  the  one  in  Theorem  4.1.  The  numerical 
results  for  the  range  of  Pe  with  respect  to  different  length  of  pattern  I in  the 
random  walk  process  will  be  presented.  □ 

To  find  the  proper  range  of  Pe  for  the  PMSI  algorithm,  as  we  can  see  in 
the  proof,  the  first  step  is  to  estimate  each  matching  probability  pj  along  with  its 
corresponding  and  Pj?,j  = 1,2,...,  2'“^.  Figure  4.2  shows  an  example  of  the 
estimated  pj’s  for  length  I = 5,  pattern  j = 1 ~ 16,  and  the  AiZ(l)  coefficient 
a = 0, 0.4, 0.7, 1.0.  Clearly,  the  matching  probabilities  are  the  same  for  the  opposite 
patterns.  Furthermore,  the  variation  of  p^’s  is  decreasing  when  a increases  and 
the  Pj’s  become  constant  when  a = 1.0,  i.e.,  random  walk  process  as  mentioned 
in  the  proof  of  Theorem  4.2.  The  matching  probabilities  for  different  length  / and 
different  a are  estimated  in  this  study.  These  estimates  are  based  on  the  average 
of  100  times  Ai2(l)  process  simulations  with  100,000  points  for  each  time  series 
simulation. 


Probability 
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+ +-  - + + --  + + 
+ - + - + - + - + ” 


Figure  4.2:  Estimated  matching  probabilities  (Pj’s)  with  length  I = 5 and  a = 0.0 
(white  noise),  0.4, 0.7, 1.0  (random  walk). 
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The  determination  of  li  (or  I2)  for  a given  I is  not  trivial.  As  we  know, 
for  given  a time  series,  the  idea  of  the  PMSI  algorithm  is  to  identify  the  most 
consistent  length  I pattern  by  testing  how  much  more  consistent  the  pattern  of  next 
I2  points  is  when  using  the  information  of  the  previous  li  points  {pattern  li-match) 
than  only  using  the  information  of  the  previous  2 points  {edge-match).  Therefore, 
it  is  very  important  to  know  how  to  choose  /i  before  applying  the  PMSI  algorithm. 
FVom  the  algorithm,  we  know  that  h > 3 and  I2  > and  intuitively,  since  we  are 
using  the  information  of  li  points  to  test  the  consistency  of  the  next  I2  points,  li 
should  be  as  large  as  possible.  But  on  the  other  hand,  li  should  not  be  too  large  so 
that  we  can  still  detect  the  difference  between  two  conditions,  li-match  and  edge- 
match.  The  optimal  li  for  a given  I should  give  the  PMSI  algorithm  the  maximum 
power  of  pattern  identifiability;  i.e.,  the  algorithm  should  be  able  to  identify  the 
pattern  of  any  embedded  signals  with  the  minimum  embedding  probability  Pe. 

Table  4.1  shows  the  minimum  pe  required  for  the  PMSI  algorithm  for  diflFerent 
choice  of  li  for  each  given  length  I and  a.  As  we  can  see  in  the  table,  in  general, 
the  minimum  pe  values  occur  when  the  li's  are  larger  values  (>  1/2),  but  not  the 
largest.  For  example,  when  I = 7,  we  would  choose  = 4 or  5;  when  / = 8,  the 
better  choices  of  li  should  be  5 or  6,  and  so  on.  Figures  4.3  and  4.4  give  graphical 
presentations  for  / = 7 (odd  number)  and  I = 8 (even  number),  respectively.  From 
these  two  figures  we  can  clearly  see  that,  when  a > 0.8,  the  required  p^s  are  almost 
the  same,  but  for  the  small  values  of  a,  the  required  Pe  values  for  the  better  choices 
(marked  * in  Table  4.1,  e.g.,  /i  = 4, 5 for  / = 7)  of  li  are  significantly  smaller  than 
the  other  selections  of  lis. 

Figure  4.5  and  4.6  show  the  proper  range  of  embedding  probabilities  {pe) 
for  identifying  a pattern  of  embedded  signals  by  plotting  the  possible  value  of 
Pe  versus  the  function  TSj  — maxj(T5j).  The  worse  case  (zigzag  pattern,  i.e., 
for  example)  ajid  a better  case  (e.g.,  +,+,-,+)  are  shown  in  the  plots 
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Table  4.1:  Minimum  required  Pe  for  identifying  any  pattern  of  embedded  signals  for 
different  selection  of  h in  each  given  I and  a. 


a 


1 

h 

h 

0.0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

1.0 

5 

3 

2 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

.1090 

.0383 

.0013 

4 

1 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

N/A 

.1068 

.0380 

.0009 

6 

3 

3 

N/A 

N/A 

N/A 

N/A 

.1023 

.0849 

.0652 

.0447 

.0281 

.0154 

.0005 

4* 

2 

.0570 

.0554 

.0520 

.0499 

.0470 

.0496 

.0477 

.0411 

.0300 

.0158 

.0004 

5 

1 

.1208 

.0993 

.0863 

.0770 

.0712 

.0638 

.0531 

.0412 

.0303 

.0158 

.0006 

7 

3 

4 

.0755 

.0595 

.0464 

.0431 

.0387 

.0322 

.0251 

.0203 

.0145 

.0076 

.0004 

4* 

3 

.0256 

.0231 

.0206 

.0219 

.0236 

.0239 

.0222 

.0183 

.0144 

.0084 

.0003 

5* 

2 

.0240 

.0226 

.0207 

.0211 

.0225 

.0228 

.0211 

.0183 

.0148 

.0087 

.0004 

6 

1 

.0383 

.0353 

.0339 

.0322 

.0305 

.0266 

.0227 

.0190 

.0137 

.0081 

.0005 

8 

3 

5 

.0308 

.0258 

.0219 

.0208 

.0187 

.0153 

.0122 

.0097 

.0074 

.0043 

.0003 

4 

4 

.0134 

.0121 

.0109 

.0115 

.0119 

.0115 

.0109 

.0100 

.0078 

.0048 

.0002 

5* 

3 

.0116 

.0108 

.0101 

.0106 

.0110 

.0109 

.0101 

.0087 

.0077 

.0050 

.0002 

6* 

2 

.0118 

.0108 

.0102 

.0106 

.0110 

.0109 

.0106 

.0098 

.0076 

.0048 

.0002 

7 

1 

.0198 

.0182 

.0172 

.0164 

.0149 

.0131 

.0108 

.0092 

.0071 

.0042 

.0004 

9 

3 

6 

.0167 

.0137 

.0122 

.0111 

.0100 

.0081 

.0064 

.0052 

.0036 

.0023 

.0003 

4 

5 

.0076 

.0068 

.0060 

.0062 

.0064 

.0062 

.0058 

.0050 

.0042 

.0026 

.0002 

5* 

4 

.0064 

.0058 

.0056 

.0058 

.0057 

.0057 

.0053 

.0049 

.0042 

.0028 

.0002 

6* 

3 

.0062 

.0054 

.0054 

.0055 

.0056 

.0056 

.0050 

.0048 

.0041 

.0028 

.0002 

7 

2 

.0065 

.0059 

.0057 

.0060 

.0060 

.0058 

.0054 

.0048 

.0041 

.0026 

.0002 

8 

1 

.0118 

.0107 

.0098 

.0089 

.0083 

.0068 

.0057 

.0049 

.0035 

.0022 

.0003 

10 

3 

7 

.0102 

.0083 

.0072 

.0061 

.0054 

.0044 

.0035 

.0027 

.0020 

.0013 

.0003 

4 

6 

.0045 

.0039 

.0037 

.0037 

.0037 

.0034 

.0031 

.0028 

.0022 

.0014 

.0002 

5 

5 

.0037 

.0033 

.0033 

.0033 

.0032 

.0031 

.0028 

.0026 

.0023 

.0016 

.0002 

6* 

4 

.0034 

.0031 

.0031 

.0030 

.0030 

.0030 

.0027 

.0027 

.0023 

.0016 

.0002 

7* 

3 

.0036 

.0031 

.0031 

.0032 

.0029 

.0029 

.0027 

.0025 

.0023 

.0016 

.0002 

8 

2 

.0040 

.0037 

.0033 

.0035 

.0033 

.0032 

.0030 

.0026 

.0021 

.0015 

.0002 

9 

1 

.0072 

.0064 

.0059 

.0052 

.0045 

.0038 

.0031 

.0025 

.0020 

.0013 

.0002 

11 

3 

8 

.0064 

.0048 

.0046 

.0036 

.0030 

.0025 

.0020 

.0015 

.0012 

.0007 

.0003 

4 

7 

.0030 

.0025 

.0021 

.0021 

.0021 

.0019 

.0017 

.0015 

.0012 

.0008 

.0002 

5 

6 

.0026 

.0022 

.0020 

.0019 

.0018 

.0017 

.0016 

.0014 

.0012 

.0009 

.0002 

6* 

5 

.0022 

.0018 

.0017 

.0019 

.0018 

.0016 

.0015 

.0013 

.0013 

.0010 

.0002 

7* 

4 

.0021 

.0018 

.0018 

.0018 

.0018 

.0016 

.0016 

.0014 

.0012 

.0009 

.0002 

8 

3 

.0022 

.0020 

.0019 

.0018 

.0018 

.0017 

.0016 

.0014 

.0013 

.0009 

.0002 

9 

2 

.0027 

.0023 

.0020 

.0020 

.0019 

.0019 

.0016 

.0015 

.0012 

.0008 

.0002 

10 

1 

.0047 

.0040 

.0032 

.0029 

.0026 

.0023 

.0017 

.0015 

.0011 

.0007 

.0003 
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Figure  4.3:  Minimum  required  embedding  probability  pe  for  different  combinations 
of  l]_  and  I2  for  given  length  of  signal  1 = 7 and  a. 


Figure  4.4:  Minimum  required  embedding  probability  Pe  for  different  combinations 
of  li  and  I2  for  given  length  of  signal  / = 8 and  a. 
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for  / = 5 (Figure  4.5)  and  7 (Figure  4.6),  o:  = 0.4  and  0.8.  From  the  left  top 
plot  in  Figure  4.5,  we  observed  that  when  I = 5,  the  PMSI  algorithm  will  fail  to 
identify  this  zigzag  pattern  (+,-,+,-)  for  any  possible  value  of  pe  (0  ~ 1/5)  since 
all  the  TSj  — maxj(r5j)  values  are  less  than  0.  But  when  a = 0.8,  this  pattern 
can  be  identified  if  Pe  > 0.048  as  shown  in  the  plot  (left  bottom  of  Figure  4.5). 
However,  the  algorithm  will  be  able  to  identify  the  other  pattern  (+,+,-,+)  with 
the  embedding  probability  larger  than  0.043  for  a = 0.4  (right  top  of  Figure  4.5), 
and  larger  than  0.028  for  a = 0.8  (right  bottom  of  Figure  4.5),  which  means  that, 
if  there  are  10,000  points  in  a given  time  series,  it  must  have  at  least  430  signals 
embedded  with  pattern  (+,+,-,+)  to  be  identified  by  PMSI  algorithm.  Prom  this 
study,  we  also  observed  that  for  / > 5,  a proper  range  of  Pe  can  always  be  found  for 
any  pattern  to  be  identified  and  the  range  will  vary  depending  upon  the  pattern 
of  the  embedded  signals  and  the  underlying  time  series  (a  in  Ai2(l)  process,  for 
example).  Figure  4.7  shows  these  observations  for  the  example  as  / = 6 and 
a = 0,0.4, 0.7, 1.0. 

Table  4.2  shows  the  numerical  results  of  the  minimum  required  embedding 
probability  Pe  to  identify  the  pattern  of  the  embedded  signals  for  different  length 
/ (=  6 ~ 11)  and  different  coefficient  a(=  0, 0, 1, ... , 1.0)  of  underlying  AR{1) 
time  series.  As  mentioned  earlier,  since  the  required  values  of  Pe  varies  for  different 
patterns,  the  values  represented  in  Table  4.2  are  the  global  minimum  required  Pe 
across  all  possible  patterns.  That  is,  the  Pe  value  for  each  combination  of  I and 
a shown  in  Table  4.2  is  the  minimum  required  value  of  Pe  for  PMSI  algorithm  to 
identify  any  possible  pattern  of  embedded  signals.  A graphical  representation  (for 
1 — 6,7, 8, 9)  of  Table  4.2  is  shown  in  Figure  4.8.  Clearly,  for  the  same  length  I of 
embedded  signals,  the  required  Pe  value  decreases  when  a becomes  laxger.  For  the 
same  coefficient  a of  AR{1)  process,  the  required  pe  value  is  smaller  for  the  larger 
length  I of  the  signals.  Furthermore,  when  a = 1,  i.e.,  random  walk  process,  the 


TS  -Max  (TS  .)  TS  -Max  (TS . ) 
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Figure  4.5:  Examples  of  estimating  the  required  value  of  Pe  (given  a pattern  ptj)  by 
plotting  Pe  versus  TSj  — max,  (TS'j)  for  1 = 5 and  a = 0.4, 0.8  in  >1/2(1)  time  series. 
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0 0.05  0.1 


0 0.05  0.1 


Figure  4.6:  Examples  of  estimating  the  required  value  of  (given  a pattern  ptj)  by 
plotting  pe  versus  TSj  - maXi{TSi)  for  / = 7 and  a = 0.4, 0.8  in  A/2(l)  time  series. 
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+ + + + + + + + + + + + + + + + — 

+ + + + + + + + +++  + +++  + — — 

+ + — + + — + + — + + — + + — + + — + + — + + — 

Figure  4.7:  Examples  {I  = 6,  a = 0, 0.4, 0.7, 1.0)  of  the  minimum  required  value  of 
Pe  for  different  patterns  of  embedded  signals. 
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Table  4.2:  Minimum  required  value  of  Pe  for  identifying  any  pattern  of  embedded 
signals  with  optimal  selection  of  li  in  Table  4.1. 


1 6 

7 

8 

9 

10 

11 

a 

0.0 

0.0570 

0.0240 

0.0116 

0.0062 

0.0034 

0.0022 

0.1 

0.0554 

0.0226 

0.0108 

0.0054 

0.0031 

0.0018 

0.2 

0.0552 

0.0207 

0.0101 

0.0054 

0.0031 

0.0017 

0.3 

0.0499 

0.0211 

0.0106 

0.0055 

0.0030 

0.0019 

0.4 

0.0470 

0.0225 

0.0110 

0.0056 

0.0030 

0.0018 

0.5 

0.0496 

0.0228 

0.0109 

0.0056 

0.0030 

0.0016 

0.6 

0.0477 

0.0211 

0.0101 

0.0050 

0.0027 

0.0015 

0.7 

0.0411 

0.0183 

0.0087 

0.0048 

0.0027 

0.0013 

0.8 

0.0300 

0.0148 

0.0077 

0.0041 

0.0023 

0.0013 

0.9 

0.0158 

0.0087 

0.0050 

0.0028 

0.0016 

0.0010 

1.0 

0.0004 

0.0004 

0.0002 

0.0002 

0.0002 

0.0002 
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a 


Figure  4.8:  Minimum  required  value  of  for  identifying  any  pattern  of  embedded 
signals,  where  length  / = 6, 7, 8, 9 and  a = 0, 0.1, . . . , 1.0. 
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minimum  required  embedding  probability  Pe  is  almost  equal  to  0,  which  means  that 
the  PMSI  algorithm  can  identify  any  pattern  of  the  embedded  signals  with  a very 
small  of  embedding  probability. 

In  summary,  in  this  section,  we  have  analytically  proved  the  pattern 
identifiability  by  the  PMSI  algorithm  with  the  proper  values  of  the  signal 
embedding  probability  for  stationary  AR{1)  time  series  and  for  random  walk 
process  by  first  showing  the  strong  law  of  large  numbers  (SLLN)  holds  for  the 
non-i.i.d.  random  variables  in  the  algorithm.  Also,  the  numerical  solutions  for  the 
minimum  required  embedding  probabilities  to  identify  the  pattern  of  the  embedded 
signals  were  shown  for  different  length  of  signals  and  for  different  coefficient  in 
Ai2(l)  time  series,  including  random  walk  process.  In  the  next  section,  the  pattern 
identifiability  by  the  PMSI  algorithm  will  be  tested  by  the  simulated  time  series. 

4.4  Applications  of  PMSI  Algorithm 

In  this  section,  we  will  first  test  the  pattern  identifiability  by  the  PMSI 
algorithm  on  the  simulated  A/2(l)  time  series  with  embedded  signals.  The 
algorithm  will  then  be  applied  on  an  epileptic  EEG  time  series  and  Wolf’s  monthly 
sunspot  time  series. 

4.4.1  Simulation  Studies 

Four  AR{1)  time  series  (coefficient  a = 0.1, 0.4, 0.7  and  1.0,  respectively) 
with  40,000  points  each  were  generated  20  times  as  background  time  series  before 
embedding  signals.  With  four  different  patterns  of  embedded  signals,  ptl,  ptl,  ptl, 
and  ptl  (i.e.,  two  different  patterns  for  length  1 = 7 and  9,  see  Figure  4.9),  and 
two  differnt  embedding  probabilities  for  each,  one  smaller  and  the  other  one  larger 
based  on  the  numerical  results  in  section  4.3,  the  feasibility  of  the  PMSI  algorithm 
for  identifying  the  pattern  of  the  embedded  signals  was  investigated.  To  ensure  the 
embedded  signals  cannot  be  trivially  identified,  the  values  of  the  signals  must  be 
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in  the  reasonable  range  comparing  with  the  underlying  AR{1)  time  series.  Figure 
4.10  shows  part  of  a simulated  AR{1)  time  series  (a  = 0.4)  with  the  four  different 
embedded  signals. 

Table  4.3  shows  the  results  from  20  simulations  for  all  32  cases  (different  Vs, 
a’s,  patterns,  and  signal  embedding  probabilities).  As  mentioned  in  the  previous 
section,  the  required  embedding  probability  for  identifying  patterns  depends 
upon  the  pattern  of  the  signals.  For  each  case,  the  one  with  the  small  embedding 
probability  fail  to  identify  the  pattern  of  the  embedded  signals  at  least  17  out  of 
20  simulations.  But  when  giving  the  larger  embedding  probabilities,  the  pattern 
of  the  embedded  signals  axe  successfully  identified  for  each  of  the  four  cases 
(at  least  18  out  of  20 -simulations).  One  other  observation  in  this  study  is  that, 
for  most  of  the  failed  cases,  the  identified  patterns  are  similar  to  the  one  of  the 
embedded  signals,  at  most  one  or  two  points  shift  to  the  left  or  to  the  right.  This 
means  that  the  embedded  signals  will  also  partially  affect  the  ones  with  similar 
patterns.  Therefore,  if  the  effect  of  the  similar  patterns  by  the  embedded  signals 
is  considered,  we  should  be  able  to  estimate  the  minimum  required  embedding 
probabilities  more  accurately. 
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Figure  4.9:  Four  different  patterns  pt\,  pt\,  and  pt^)  of  the  embedding  signals 
for  testing  the  pattern  identifiability  by  the  PMSI  algorithm. 
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Figure  4.10:  Four  different  patterns  {pt\,  pt\,  pt^,  and  pt^)  of  the  embedding  signals 
in  a simulated  underlying  AR{1)  time  series  (a  = 0.4). 
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Table  4.3:  Results  of  the  20  simulations  (n  = 40, 000  in  each  simulaiton)  for  testing 
the  pattern  identifiability  by  the  PMSI  algorithm. 


1 

O' 

pattern 

Pe 

success 

7 

0.1 

ptl 

0.0025 

0/20 

7 

0.1 

ptl 

0.0050 

20/20 

7 

0.4 

ptl 

0.0010 

0/20 

7 

0.4 

pt\ 

0.0050 

20/20 

7 

0.7 

ptl 

0.0005 

1/20 

7 

0.7 

ptl 

0.0020 

20/20 

7 

1.0 

ptl 

0.0002 

3/20 

7 

1.0 

ptl 

0.0008 

18/20 

7 

0.1 

ptl 

0.0030 

0/20 

7 

0.1 

ptl 

0.0060 

20/20 

7 

0.4 

ptl 

0.0030 

0/20 

7 

0.4 

ptl 

0.0060 

18/20 

7 

0.7 

ptl 

0.0015 

0/20 

7 

0.7 

ptl 

0.0060 

20/20 

7 

1.0 

ptl 

0.0001 

0/20 

7 

1.0 

ptl 

0.0006 

18/20 

9 

0.1 

ptl 

0.0005 

0/20 

9 

0.1 

ptl 

0.0015 

20/20 

9 

0.4 

ptl 

0.0005 

0/20 

9 

0.4 

ptl 

0.0015 

19/20 

9 

0.7 

ptl 

0.0005 

0/20 

9 

0.7 

ptl 

0.0015 

20/20 

9 

1.0 

ptl 

0.0001 

0/20 

9 

1.0 

ptl 

0.0004 

18/20 

9 

0.1 

ptl 

0.0008 

1/20 

9 

0.1 

ptl 

0.0050 

19/20 

9 

0.4 

ptl 

0.0008 

0/20 

9 

0.4 

ptl 

0.0060 

19/20 

9 

0.7 

ptl 

0.0008 

0/20 

9 

0.7 

ptl 

0.0050 

19/20 

9 

1.0 

ptl 

0.0001 

2/20 

9 

1.0 

ptl 

0.0005 

20/20 
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4.4.2  Application  on  EEG  Time  Series 
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Figure  4.11:  Ten  minute  EEG  time  series  data. 

Since  its  discovery  by  Richard  Caton  (1895)  and  its  first  systematic 
investigation  by  Hans  Berger  (Berger  1929,  Gloor  1969),  the  electroencephalogram 
(EEG)  has  been  the  most  utilized  signal  to  clinically  assess  brain  function.  This 
PMSI  algorithm  has  been  applied  to  identify  the  most  consistent  pattern  in  a 
30-minute  period  of  electroencephalogram  (EEG)  time  series  data  recorded  from  a 
patient  with  epileptic  seizures.  Figure  4.11  shows  a 10-minute  EEG  data  used  in 
this  application.  This  EEG  data  is  sampled  with  a sampling  frequency  of  200  Hz 
and  band-pass  filtered  at  0.1  ~ 70Hz.  Hence,  the  data  in  this  30-minute  period 
consists  of  360, 000  points  of  EEG  data  with  mean  value  104.75  and  « 600. 

The  choice  of  li  value  here  is  not  obvious,  but  we  know  that  the  identified  pattern 
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will  be  less  informative  if  k is  too  small  and  it  will  be  difficult  to  find  the  pattern 
matching  subsequences  if  li  is  too  large.  Hence,  the  values  of  k = 5, 6, 7, 8, 9, 10  and 
/2  = 6 have  been  tried.  This  means  that  first  we  find  the  pattern  /i-match  and  then 
identify  the  most  consistent  pattern  in  the  next  6 points  by  the  PMSI  algorithm. 
There  are  two  reasons  why  we  chose  I2  = 6.  First,  from  the  results  in  Table  4.1, 
we  know  that,  compared  with  lx,  I2  should  not  be  too  small  so  that  the  PMSI 
algorithm  will  have  more  powerful  identifiability.  Secondly,  for  the  purpose  of 
making  predictions  in  the  future  series,  which  will  be  discribed  in  the  next  chapter, 
the  identified  patterns  should  be  more  useful  for  predicting  more  step  ahead  than 
only  predicting  the  next  one  or  two  values.  Figures  4.12,  4.13  show  the  identified 
most  consistent  patterns  for  lx  = 5, 6, 7, 8 and  Table  4.4  shows  the  statistical  results 
of  this  algorithm  for  six  different  lx  values,  where  pi’s  and  p2’s  in  the  table  are  the 
two  estimated  conditional  probabilities  (/2-match  given  /i-match  and  /g-match  given 
ed^e-match)  for  the  most  predictable  patterns. 

Table  4.4:  Pattern  identification  analysis  in  a 30-minute  EEG  time  series. 


tJ  of  Tests 

Pi 

P2 

TS 

Overall  p-value 

5 

6160 

0.5733 

0.048 

21.187 

< 0.00001 

6 

6148 

0.5349 

0.048 

21.024 

< 0.00001 

7 

5270 

0.5238 

0.048 

17.596 

< 0.00001 

8 

4343 

0.6452 

0.048 

15.502 

< 0.00001 

9 

3485 

0.4615 

0.048 

13.899 

< 0.00001 

10 

2709 

0.6667 

0.048 

14.134 

< 0.00001 

As  we  can  see  in  the  Table  4.4,  with  significance  level  a = 0.05,  all 
/i  = 5 ~ 10  pass  the  test  of  been  meaningful  patterns  (overall  p-value  less  than 
0.05).  The  identified  pattern  has  the  largest  test  statistic  TS  when  lx  = 5,  which 
means  that  the  subsequences  have  the  most  consistent  pattern  in  the  next  6 points 
when  their  previous  5 points  are  pattern  5-matched  to  each  other.  In  the  next 
chapter,  we  will  use  these  identified  patterns  to  forecast  the  future  values  in  the 
next  30-minute  EEG  time  series. 
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O"  Pattern  Ij  - match  part 
• - Test  of  consistency  of  next  I 2 (=6)  points  part 


Figure  4.12:  Ten  identified  signals  in  this  30-minute  EEG  time  series  using  li  = 5,6 
and  I2  = 6. 
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O - Pattern  - match  part 
# - Test  of  consistency  of  next  I2  (=6)  points  part 
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Figure  4.13:  Ten  identified  signals  in  this  30-minute  EEG  time  series  using  li  = 7,  S 
and  I2  = 6. 
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4.4.3  Application  on  Sunspot  Time  Series 


Figure  4.14:  The  first  1500  points,  from  1749  to  1873,  of  the  six-month  smoothed 
Wolf’s  monthly  sunspot  time  series. 

The  PMSI  algorithm  has  been  applied  on  the  Wolf’s  monthly  sunspot  time 
series  as  described  in  Section  2.3.  In  order  to  better  identify  a meaningful  pattern 
and  because  of  the  oscillation  of  the  original  monthly  sunspot  time  series,  the 
six-month  smoothed  data  of  the  first  125  years,  from  1749  to  1873,  were  used  in  the 
algorithm  to  identify  the  most  predictable  pattern  in  this  data.  Figure  4.14  shows 
the  smoothed  sunspot  time  series  used  in  this  analysis. 

The  length  of  pattern  I = 12, 13, 14, 15  with  = 6, 7, 8, 9 and  ^2  = 6 
have  been  applied  in  the  PMSI  algorithm,  which  means  that  the  algorithm  will 
identify  the  most  consistent  pattern  of  the  sunspot  data  in  the  next  6 months  if  the 
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Figure  4.15:  Five  identified  signals  (/i  = 6,7, 8,9  and  I2  = 6)  with  the  same  most 
predictable  pattern  of  the  six-month  smoothed  Wolf’s  monthly  sunspot  time  series. 
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previous  6 (or  7,8  and  9)  months  of  the  data  are  pattern  matched  to  each  other. 

The  reason  to  choose  ^2  = 6 is  because  the  length  of  the  smoothing  window  is  6 
months.  Figure  4.15  shows  the  identified  patterns  for  four  different  length  of  signals 
and  Table  4.5  gives  the  statistical  results  of  the  PMSI  algorithm.  The  definitions  of 
Pi’s  and  p2*s  in  the  table  are  the  same  as  in  the  EEG  time  series  application. 

Table  4.5:  Pattern  identification  analysis  in  smoothed  sunspot  time  series. 


k 

tt  of  Tests 

Pi 

P2 

TS 

Overall  p-value 

6 

52 

0.487 

0.140 

5.640 

< 0.00001 

7 

35 

0.546 

0.139 

6.273 

< 0.00001 

8 

18 

0.548 

0.139 

6.140 

< 0.00001 

9 

15 

0.560 

0.139 

5.730 

< 0.00001 

As  we  can  see  in  the  Table  4.5,  with  significance  level  a = 0.05,  all 
= 6 ~ 9 pass  the  test  of  being  meaningful  patterns  (overall  p>-vaJue  less  than 
0.05).  The  identified  pattern  has  the  largest  test  statistic  TS  when  /i  = 7,  which 
means  that  the  sunspot  data  has  the  most  consistent  pattern  in  the  next  6 months 
when  the  previous  7 months  of  data  are  pattern  matched  each  other.  Besides,  as  in 
the  EEG  time  series  analysis,  the  identified  patterns  for  different  li  values  shown  in 
Figure  4.18  are  consistent,  which  may  indicate  that  the  most  predictable  pattern  in 
a time  series  is  invariant  over  the  li  values  used  in  the  PMSI  algorithm.  In  the  next 
chapter,  we  will  use  these  identified  patterns  to  forecast  the  future  sunspot  values 
in  the  next  103  years,  from  1874  to  1976,  of  sunspot  time  series. 

4.5  Summary  and  Discussion 

In  this  chapter,  based  on  the  concept  of  the  pattern  match  criterion 
described  in  Chapter  3,  we  introduced  the  Pattern  Match  Signal  Identification 
(PMSI)  algorithm  for  identifying  the  most  predictable  pattern  of  signals  in  a given 
time  series.  By  first  proving  the  validity  of  the  strong  law  of  large  numbers  (SLLN) 
for  the  weakly  dependent  pattern  match  random  variables  (/'-match,  /i-match, 
/j-match  and  edge-match),  analytically,  we  demonstrated  that,  for  a given  AR{1) 
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time  series  and  the  length  of  the  embedded  signals  I,  the  pattern  of  the  embedded 
signals  can  be  identified  by  the  PMSI  algorithm  if  the  embedding  probability 
is  larger  than  a minimum  required  value.  This  minimum  required  embedding 
probability  can  be  estimated  if  the  underlying  time  series  is  known.  The  numerical 
solutions  of  the  minimum  required  Pe  were  shown  for  different  underlying  >1/2(1) 
processes  (with  respect  to  the  coefficient  a)  and  for  different  length  of  signals  (/). 

The  expected  pattern  identifiability  by  the  PMSI  algorithm  was  confirmed 
by  the  simulated  A/2(l)  time  series  with  different  coefficient  a,  length  and  pattern 
of  the  embedded  signals,  and  embedding  probabilities.  Based  on  the  numerical 
results  of  the  minimum  required  Pe,  for  each  case  of  a,  I and  pt^,  two  values  of  pg 
were  used  and  applied  the  PMSI  algorithm  to  identify  the  pattern  of  the  embedded 
signals.  The  results  showed  that,  when  the  value  of  Pe  is  too  small,  the  algorithm 
failed  to  identify  the  pattern  of  the  embedded  signals,  but  it  successfully  identified 
the  patterns  with  the  larger  value  of  pg.  These  results  confirmed  the  identifiability 
conditions  of  the  PMSI  algorithm  proved  in  Section  4.3. 

In  this  simulation  study,  for  some  cases,  the  larger  values  of  pg  used  are 
smaller  than  the  minimum  required  pg  (given  in  Table  4.2).  As  mentioned  earlier, 
in  practice,  we  will  not  know  the  pattern  of  the  signals  in  a time  series,  and  the 
minimum  pg  is  the  value  such  that  the  PMSI  algorithm  will  identify  the  pattern 
of  any  embedded  signals.  But  for  most  of  the  patterns,  their  minimum  (local 
minimum)  required  values  of  pg  are  smaller  than  the  overall  minimum  (or  global 
minimum)  required  pg. 

This  algorithm  was  also  applied  in  two  real  time  series  data  sets:  one  is  the 
electroencephalogram  (EEG)  time  series  from  a patient  with  epileptic  seizures, 
the  other  one  is  the  well  known  Wolf’s  monthly  sunspot  time  series.  For  both 
applications,  the  PMSI  algorithm  successfully  identified  the  most  predictable 
patterns  in  the  time  series  with  the  significant  statistical  results.  More  importantly. 


80 


with  different  length  of  the  subsequences,  the  algorithm  identified  the  same  pattern 
of  signals  in  the  time  series.  This  may  indicate  that  the  most  predictable  pattern  of 
signals  in  a time  series  can  be  identified  by  the  PMSI  algorithm  with  any  length  of 
testing  subsequences  in  a reasonable  range. 

In  conclusion,  based  on  the  idea  of  pattern  match  and  statistical  analysis, 
the  proposed  PMSI  algorithm  provides  a reliable  method  to  identify  the  most 
predictable  pattern  of  the  signals  in  a time  series  if  the  probability  of  the  signal 
occurrences  in  the  time  series  is  in  a reasonable  range.  In  the  next  chapter,  we  will 
show  how  to  apply  these  identified  patterns  for  the  forecasts  in  the  future  time 
series. 


CHAPTER  5 
FORECASTING 

5.1  Method 

As  mentioned  in  Chapter  1,  one  of  the  main  purposes  of  the  time  series 
analysis  is  to  determine  what  information  in  the  past  we  should  use  in  order 
to  predict  the  future.  In  Chapter  4,  we  have  introduced  the  PMSI  algorithm 
to  identify  the  most  predictable  pattern  in  a time  series.  We  can  call  the  time 
period  of  this  time  series  learning  period.  In  this  chapter,  we  will  discuss  how  the 
identified  patterns  by  the  PMSI  algorithm  in  the  learning  period  can  be  used  to 
make  forecasts  of  their  future  behavior.  That  is,  if  the  period  of  a near  future  time 
series  after  the  learning  period  is  called  prediction  period,  we  will  describe  how  to 
use  the  signals  (or  subsequences)  with  the  most  predictable  pattern  identified  by 
the  PMSI  algorithm  in  the  learning  period  and  then  test  if  this  pattern  will  remain 
consistent  in  a near  future  prediction  period. 

Suppose  in  a learning  period  time  series  C7  = {ui,U2,  • • • , ««}> 

®ii)  ®i2)  • • • > of  length  lx  have  been  identified  by  the  PMSI  algorithm  such  that 
their  next  I2  points  have  the  most  predictable  pattern,  then  the  question  is:  How 
to  use  them  for  predicting  the  future  time  series?  That  is,  for  the  next  sequence 
of  time  series  data,  say  V = {vi,  V2,...,  Vn},  can  the  subsequences  ajj’s  provide  a 
good  prediction  for  the  value  Vj+i_i  if  the  subsequence  Vj  = {vj,  Uj+i, . . . , is 

pattern  Zi-matched  with  ®j’s? 

In  the  learning  period  U,  we  define  the  most  predictable  pattern  as  the 
pattern  that  gives  the  most  consistent  next  I2  points  when  the  previous  lx  points 
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are  pattern  Zi-matched.  Hence,  intuitively,  we  should  use  all  the  signals  with  li~ 
match  pattern  in  the  learning  period  jointly  for  the  future  predictions.  Regression 
model  which  regresses  the  next  I2  points  on  the  previous  Zi  points  is  the  obvious 
choice,  i.e.,  to  use  regression  equation  to  predict  if  {vj,  vj+i, is 

pattern  Zi-matched  with  Xi^  = {uj,,  . . . , where  Xi^  is  the  referenced 

signal  for  identifying  this  pattern  in  the  learning  period.  The  performance  of  this 
prediction  method  was  evaluated  for  the  prediction  periods  of  the  EEG  time  series 
and  Wolf’s  monthly  sunspot  time  series  in  which  their  learning  periods  were  both 
analyzed  by  the  PMSI  algorithm  in  Section  4.4. 

5.2  Forecasting  in  EEG  Time  Series 
A 30-minute  prediction  period  of  EEG  time  series  following  the  30-minute 
learning  period  used  to  identify  the  most  predictable  pattern  in  Section  4.4.2  was 
applied  to  evaluate  the  prediction  method  described  in  the  previous  section.  As 
mentioned,  the  idea  is  to  test  how  well  the  identified  pattern  in  the  learning  period 
can  be  used  to  make  predictions  beyond  the  learning  period.  As  described  above, 
we  first  search  for  all  the  sequences  with  length  li  which  can  be  pattern  Zi-matched 
with  the  most  predictable  signals  identified  in  the  learning  period.  The  prediction 
values  for  the  next  I2  points  can  then  be  calculated  by  applying  the  regression 
equations  fitted  in  the  learning  period  for  the  signals  with  the  most  predictable 
pattern.  Average  prediction  error  is  calculated  and  is  compared  with  the  ones 
predicted  by  the  AR{li)  model,  which  is  fitted  in  the  whole  learning  period.  One 
may  argue  that  the  AR(li)  may  not  be  the  best  model  in  the  learning  period. 
Therefore,  the  prediction  performance  of  the  proposed  method  is  also  compared 
with  the  optimal  AR  model  which  is  identified  by  the  generally  applied  Akaike’s 
Information  Criterion  (AIC)  (Akaike  1974).  Table  5.1  gives  the  AR{li)  models 
fitted  in  the  learning  period.  The  optimal  AR  model  identifed  by  AIC  values  fitted 
in  the  learning  period  has  of  order  596,  which  means  A/2(596)  has  the  smallest  AIC 
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value  of  all  possible  orders  of  AR  models.  For  the  patterns  identified  in  Section 
4.4.2  for  li  = 5,6,...,  10,  Table  5.2  summaries  their  regression  equations  for 
predicting  the  next  6 points. 

Table  5.1:  AR{li)  models  fitted  in  the  learning  period  of  an  EEG  time  series  for 
/i  = 5, 6, . . . , 10,  where  u\  = Ui~  n and  /x  is  the  mean  value  of  U. 


h 

AR{li)  models  (noise  terms  ignored) 

5 

= 2.18uJ_i  - 1.97u;_j  + 1.33u;_g  - 0.86uJ_4  +0.29ti:_g 

6 

= 2.21u;_,  - 2.06ui_2  + 1.48u<_3  - 1.08u:_4  +0.54u;_g  - 0.12u;_g 

7 

u\  = 2.21u;_,  - 2.06u;_j  + 1.48t«;_3  - 1.07u;_4  + 0.53u;_g  - 0.11ti-_g  - 0.004uJ_., 

00 

u'i  = 2.21u'i_,  - 2.06«;_2  + 1.47u-_3  - 107u-_4  + 0.52u;_3  - 0.10u;_g  - 0.02u;_t  +0.006«-_g 

9 

uj  = 2.21»;_1  - 2.06u5_3  +1.48uJ_3  - 1.07uJ_4  +0.84u;_5  - 0.1U<_3  + 0.008uJ_7  - 0.02uJ_g  +0.01u<_8 

10 

= 2.21u;_,  - 2.06u;_3  +1.48u<_3  - 1.08u;_4  +0.67u;_g  - 0.19u<_g  +0.llu;_^  - 0.16u<_g  +0.16u;_g  - 0.0ru’-_^g 

Figures  5.1  and  5.2  show  part  of  the  /i-match  subsequences  in  the  prediction 
period  for  l\  = 5, 6, 7, 8,  respectively,  and  their  6-step  ahead  predictions  by 
regression  equations,  AR{h)  and  optimal  AR  models.  With  /i  = 5, 6, ... , 10,  the 
numbers  of  the  pattern  /i-matched  subsequences  in  the  prediction  period  are  62, 

76,  59,  37,  47,  23,  respectively.  The  results  of  the  average  prediction  error  (average 
difference  between  predicting  value  and  true  value)  by  the  regression  equations 
from  the  identified  pattern  and  by  the  two  AR  models  (/i  and  optimal)  axe  shown 
in  Table  5.3. 

As  we  can  see  in  Table  5.3,  for  the  pattern  /x-matched  subsequences, 
the  predictions  for  the  next  farther  (four-,  five-  or  six-step  ahead)  points  by  the 
proposed  method  are  all  significantly  better  than  the  ones  by  the  AR{li)  or  the 
optimal  AR  models  identified  by  AIC  values.  Furthermore,  the  prediction  errors 
when  li  = 5, 6, 7 are  smaller  than  when  l\  = 8, 9, 10,  which  is  corresponding  to  their 
larger  test  statistic  values  in  the  learning  period  (Table  4.4).  In  conclusion,  for  the 
application  on  this  long  term  EEG  time  series,  the  PMSI  algorithm  not  only  can  be 
used  to  identify  the  most  predictable  pattern  of  the  signals,  but  also  can  help  for 
making  the  future  predictions. 


84 


Table  5.2:  Regression  equations  of  the  next  6 points  regress  on  the  previous 
points  when  the  subsequences  are  pattern  /i-matched  identified  by  the  PMSI  algo- 
rithm in  the  learning  period  of  an  EEG  time  series  U . 


h 


Regression  equations  (error  terms  ignored) 


Uf.^1  — —649.83  + 1.74Uf  — l-84tij_i  + 1.59tij_2  — l-56t4j_g  -f-  0-60tif_4 
ii{^2  “ — 1208. 10  + 1.82u j — 2.69u j j 4"2.31u^_2  — 2.73ti -}-  0.80u^_4 
= -683.28  + 2.24u;  - 3.37uj_i  2.60ui_2  - 2.97ui_g  + 0.77ui_4 

u*+4  = —445.08  2.14u{  — 3.58u^«i  4-  2.88uj_2  “ 2.62ui^g  + 0.38ti»^4 

= —419.45  -p  1.57uj  — 3.14uj_i  + 2.21ui_2  ~ 2.13tii_g  — 0.16ui_4 

Ui+fl  = 188.37  + 1.23u;  - 2.66ui_i  + 1.77u^_2  - 1.58u;_g  - 0.28u^_4 


— 226.89  4*  2.34t<i  — 2.24u{_|^  4^  1.70uj_2  *“  1.32w^  _g  4"  0.64ti^^4  4"  0.05t*^ 

u j^2  — — 1.38  + 2.72u  j — 3.02u;_]^  4"  l-44u;_2  — 0.71ti^_g  -4  0. 19ti^  _4  4"  0.49ti^_g 

tii^g  =s  123.82  4*  3.06uj  — 3.66«<_i  + 1.09tij_2  + 0.10tii_g  — 0.08uj_4  4-  0.56u^_g 

ti|^4  — 224.95  4"  2.96u|  — 4.21u^_i  4*  1.80tij ^2  — 0.49tt(^g  ^0.61ui_4  4"0. 17uj._g 

uj^g  = —331.44  -{-  2.11tij  — 3.96ti,'_i  + 1.86ui_2  — 0.47tt^_3  4"  0.63ui_4  — 0. 17uj.g 
tii+6  = -762.42  4-  1.29t*i  - 3.59ui_i  + 1.66w;_2  - 0.01«i_g  +0.23ui_4  - 0.68ui_g 

— —462.82  4"  1.72u;  — 1.47uj ^ 4"  0.73u;_2  — 0.49ti^_g  -}-  0.07ti;_4  4"0.11tij_g  0.07uj_g 

ti{4.2  — — 774.95  4"  2.07u  j — 2.28tAj j 4*  0.76ti^ _2  — 0.44u^_g  4*  0.03tij  _4  4"  0.39t*j _g  4“  0. 26t4,'«.g 
Uf^g  = —918.37  + 2.33ti{  — 3.26uj_i  4"  1.22tii_2  — 0.72tii_g  + 0.36t<{_4  -4  0.84u{_g  — 0.08tf^_g 
u{^4  = —1478.99  + 1.89ui  — 3.60u^_i  -4  1.93ui_2  — 1.77i*i_g  -4  1.86t»i_4  — 0.03u{_g  4"0.69tii_< 
tif^g  — — 2165.56  "4  0.99ii j — 3.36ti{_|^  4*2.27u^^2  — 2.47t*^_g  4"  3. 1 lu{ _4  •”  1. 18u^_g  4^  1.39wj _ ^ 
tij+g  = -2405.61  +0.25ui  - 3.09«i_i  + 2.67u;_2  - 3.07u;_3  +3.92ti;_4  - 1.69«i_g  + 1.30«i 


2.29tij^^  4 1.53uj_2  — 0.78t*i_g  4 0.38tii_4  4 0.49t*i_g  — 0.96uj_g  4 0.26ti^_7 

>.42u,’_4  4 1.71u;_g  — 2.43uj_g  4 0.27u;_7 


Ui+i  = 654.94  4 2.48uj 

ui+2  = 3822.64  4 4.28ui  - 3.62«i«i  4 2.31u<_2  - 0.90ti4_3  4 O.^ 
u{^3  = 7460.07  4 6.69t*^  — 6.02u{_^  4 4.01tit«-2  ~ 0.79ti4_3  — 1.69uj_4  4 6.34ti^_g  • 

— 7.12u 4^*81**i  — 2 — 0.26u j _g  — 3.94*4^ _ 4 4 10.60ti ^ _g  — 10. 50u j_g  4 1. 16ti^ _7 

I — 4.74u^_4  4 11.09tii_g  — 10.21u{_g  — 0.06ti^_7 


Ui^.4  = 7733.30  4 6.96u 
«i-l-g  = 7853.23  4 6.36u 
= 9203.96  4 7.10tx 


6.71tii_g  4 0.72ui_7 


— 6.61tij_^  44.17i*^_2  40.88tx^_g  • 

7.90ti 4 5.36u^ _2  4 !•  16u^ _g  — 6. 18uj_4  4 12.81t*^_g  — 10.99u^ _g  — 0. 51t*^_7 


M»+l  = —892.64  4 1.74u^  — 2.1duj...i  4 l-74tii_2  — 1.83tii_g  4 1.79t*<_4  — 1.01uj_g  — 0.56ti{_g  4 1.56ti^_7  — 0.90ti{_g 
“t42  = -2621.34  4 1.40«i  - 2.79ui_i  4 1.67u^_2  - 1.75ui_3  4 2.20ui_4  - 1.46u^_g  - 1.20ui_g  4 4.09tii_7  - 2.16ui_8 

t4f ^3  — —3998.66  4 0.70t4j  — 3. 14u^_i  4 1.49ti^_2  — 1.56u^  _g  4 2.31t*^_4  — 1. 70tn_g  — 0.79ii{ _g  4 5.37u^  _7  — 3.24u 

t4{q.4  = —4909.28  4 0.09t*^  — 3.84uj^i  4 2.66u^_2  — 3.00ti^_g  4 3.92u^_4  — 2.74u^_g  4 0.07u^_g  4 6.16tij_7  — 4.58u{_g 

Uj^.g  = —4910.14  — 0.42uj  — 4.18tii_j  4 3.39u^_2  — 3.64ui_g  4 4.27uj_4  — 1.90tij_g  — 0.98uj_g  4 7.40ti^_7  — 6.61u;_g 

u{^.g  = —4465.99  — 0.94u{  — 3.96ui_i  4 3.36u^_2  — 3.10ui_3  4 3.16t*i_4  4 0.23u^_g  — 2.44u{_g  4 7.84u^_7  — 7.96ui_ 


10 


“<44  ^ 


1.12ui_i  -0.13«i_2  4 0.63u<_3  40, 

, • 0.48uj_i  — 1.92uj_2  4 2.09ti^_3 

1.59ti{  4 0.62tn_i  — 3.39ti^^2  4 3.06«^_g 
9855.21  — 3.25u{  4 0.98u^_j  — 3.63ti^_2  4 2.71ti^_g  — 0.82tx^_4  4 3.30ii^_g 


= —896.00  4 1.49«, 

Ui^>2  = —3028.26  4 0.66t*4  — 0.48u»_i  — 1.92uj_2  4 2.09ti 

“t43  = -6955.07  - 1.59tii  4 0.62tii_i  - 3.39tii_2  + 3.06«^_3  - 1.04tx^_4  4 l-67«i_g  - 2 


l.20ti;_4  - 1.02ui_g  4 1.36ti^_g  - 0.41u;_7  - 1.61t*i_8  4 112^ 
ii—3  4 0.11ui^4  — 1.68t*i_g  4 1.79ti^_g  — 0.02uj_7  — 3.23ti{^8  + 2.31 


- 6.45ui_8  4 3, 
%—j  — 6.65ui_8  43 

24u{  4 1.79ti^ — 3.94u  j _2  4 2.33tt<_3  4 0.66u^  _4  4 2. 19u{  _g  — 5.25u  j_g  4 5. 29u^  _7  — 4.98u ^ _8  ~l"  3 


i.77wj_g  4 4'00tt^_7  — 6.' 

6.37ui_g  4 6.60tii_7  - 6.1 


ui+g  = -12906.47  - 6 

Uj^g  = —16379.93  — 7.65tii  4 3.04u^_i  — 4.53tx{_2  4 1.73u^_g  4 2.32u^_4  4 l.05u^_g  — 5.16uj_g  4 6.93u<_7  — 5.3luj_8  4 < 
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Signals 


# - six-step  ahead  predictions  by  regression  model 
A “ six-step  ahead  predictions  by  optimal  AR  model 
X - six-step  ahead  predictions  by  AR(  /,  ) model 


Signals 


Figure  5.1:  Pattern  Zi-matched  subsequences  and  their  6-step  ahead  prediction 
values  in  the  prediction  period  of  an  EEG  time  series  when  li  = 5 and  6. 
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Signals 


# - six-step  ahead  predictions  by  regression  model 
A - six-step  ahead  predictions  by  optimal  AR  model 
X - six-step  ahead  predictions  by  AR(  /. ) model 


Signals 


Figure  5.2:  Pattern  /i-matched  subsequences  and  their  6-step  ahead  prediction 
values  in  the  prediction  period  of  an  EEG  time  series  when  lx  = 7 and  8. 
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Table  5.3:  Average  prediction  errors  for  the  next  6 points  in  the  prediction  period 
of  an  EEG  time  series  by  regression  equations,  AR{li)  and  optimal  AR  models 
fitted  in  the  learning  period. 


h 

Regression 

AR{h) 

AR  — opt. 

5 

134.45 

179.05 

166.40 

^^•+2 

306.42 

518.26 

485.06 

Vi+3 

400.55 

925.49 

872.34 

Vi+4 

475.79 

1343.54 

1293.50 

^<+5 

536.24 

1728.30 

1690.20 

^^i+6 

546.52 

2033.01 

2020.04 

6 

Vi+1 

140.24 

159.82 

158.82 

Vi+2 

289.37 

414.18 

409.36 

Vi+3 

408.16 

757.85 

742.95 

Vi+4 

536.84 

1167.93 

1148.39 

Vi+5 

602.04 

1534.74 

1506.31 

Vi+6 

649.19 

1827.70 

1786.66 

7 

Vi+1 

131.06 

179.13 

180.43 

Vi+2 

311.27 

531.23 

528.60 

Vi+3 

454.86 

953.53 

940.46 

Vi+4 

577.62 

1397.41 

1381.53 

Vi+5 

667.98 

1802.30 

1777.28 

Vi+6 

671.72 

2095.01 

2057.17 

8 

Vi+1 

165.21 

151.05 

144.48 

Vi+2 

464.37 

441.59 

436.75 

Vi+3 

708.16 

780.20 

762.38 

Vi+4 

936.65 

1177.71 

1141.50 

Vi+5 

1104.35 

1586.44 

1539.43 

Vi+3 

1291.03 

1958.55 

1893.98 

9 

Vi+1 

123.85 

130.00 

128.53 

Vi+2 

314.47 

414.19 

403.15 

Vi+3 

456.32 

785.61 

768.05 

Vi+4 

615.79 

1180.41 

1161.99 

Vi+5 

760.73 

1500.27 

1468.78 

Vi+6 

837.97 

1727.31 

1684.57 

10 

Vi+1 

162.55 

137.89 

126.68 

Vi+2 

384.34 

396.37 

368.02 

Vi+3 

591.32 

662.33 

630.11 

Vi+4 

824.51 

991.14 

961.40 

Vi+5 

962.33 

1373.79 

1331.35 

Vi+6 

1049.34 

1726.54 

1680.50 
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5.3  Forecasting  in  Wolf’s  Sunspot  Time  Series 

In  Section  4.4.3,  we  applied  the  PMSI  algorithm  on  the  6-month  smoothed 
Wolf’s  monthly  sunspot  time  series  from  1749  to  1873  and  identified  its  most 
predictable  patterns  for  different  length  of  signals.  In  this  section,  we  will  apply 
the  introduced  forecasting  method  to  make  predictions  for  the  remaining  period 
of  time  series.  The  prediction  period  is  from  1874  to  1976  which  includes  1236 
points  of  the  time  series.  Since  the  time  series  we  used  is  6-month  smoothed,  we 
will  focus  the  predictions  on  the  values  of  the  next  sixth  point  to  ensure  that  the 
values  to  be  predicted  are  not  overlapped  with  the  pattern  /i-matched  part  of  the 
subsequences.  As  in  the  forecasting  for  the  EEG  time  series,  first  we  have  to  fit  the 
regression  equations  for  the  pattern  /i-matched  subsequences  (the  most  predictable 
pattern)  in  the  learning  period.  Table  5.4  gives  these  fitted  regression  equations  for 
h = 6, 7, 8, 9.  These  regression  equations  will  be  used  to  make  predictions  once  the 
identified  patterns  are  observed  in  the  prediction  period. 

There  are  some  well  known  time  series  models  for  analyzing  Wolf’s  sunspot 
data  such  as  Box  and  Jenkins’s  AR{2)  model  (1976),  Bailey’s  AR{6)  model  (1965), 
and  Woodward  and  Gray’s  ARMA{8, 1)  model  (1978).  However,  these  models 
were  all  concentrated  on  the  yearly  sunspot  data  and  may  not  be  good  models  for 
the  monthly  data.  Therefore,  we  cannot  compare  our  results  with  these  models. 

To  be  consistent  with  the  previous  EEG  application,  the  prediction  results  by  the 
identified  regression  equations  will  be  compared  with  the  AR{li)  and  optimal  AR 
models  (by  AIC  values).  Table  5.5  shows  the  AR{lx)  models,  l\  = 6,7, 8,9,  fitted  in 
the  entire  learning  period.  The  optimal  AR  model  identifed  by  AIC  values  fitted 
in  the  learning  period  has  of  order  55,  which  means  A/2(55)  has  the  smallest  AIC 
value  of  all  possible  orders  of  AR  models. 

The  ARM  A models  by  Woodward  and  Gray’s  R-  and  S-  axrays  method 
could  be  simply  approximated  to  an  AR  model.  For  example,  their  ARMA{8, 1) 
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model  for  the  yearly  sunspot  data  can  be  approximated  to  an  yli?(10)  model. 
Hence,  the  comparison  between  the  proposed  forecasting  method  and  their  models 
should  be  covered  by  comparing  with  the  optimal  AR  model  as  mentioned. 


Table  5.4:  Regression  equations  of  the  next  sixth  point  regress  on  the  previous 
li  points  when  the  subsequences  are  pattern  /i-matched  identified  by  the  PMSI 
algorithm  in  the  learning  period  of  the  smoothed  sunspot  time  series  U. 


h 

Regression  equations  (error  terms  ignored) 

6 

= 39.66  — 0.27uj  — O.lOti^^i  -f  1.60ui_2  — l'34t*j_3  + 0.96u^_4  — 0.36ui_5 

7 

Ui+fl  = -26.98  + - 1.82ui_i  + 2.Uui_2  - 2.13u;_3  -t-2.66ui_4  - 2.68tii_3  + 

00| 

= 13.82  + 1.67ui  — 1.06uj_i  -|-  0.93u{^2  ""  2.26uj_3  + 3.06t<t  — 4 ~ l.51U|_5  + 2.30tii_g  — 2.10tii_7 

9 

= 21.03  + 2.74u^  — 6. 16ti{^ j + 8. 18uj _2  — 6.12u{_3  + 8.08u j _4  — 4. 18u ^ _5  -f-  4.74u — 4.71ti^ _7  — 0.20ut  — 8 

Table  5.5:  AR{1{)  models  fitted  in  the  learning  period  of  smoothed  monthly 
sunspot  time  series  for  /i  = 6, . . . , 9,  where  uj  = itj  — //  and  /x  is  the  mean  value  of 
U. 


h 

AR{li)  model  (noise  terms  ignored) 

6 

= 1.46ui_i  -0.37u;_a  -0.08uJ_3  +0.06u<_4  - 0.13u;_j  +0.06u;_g 

7 

uj  = 1.43u5_,  - 0.32u5_a  - 0.10«;_3  +0.08u<_4  - 0.02uJ_g  - 0.39u;_g  +0.31uJ_j 

00 

u;  = 1.53u<_,  - 0.44u;_2  - 0.10u:_3  +0.10u<_4  - 0.06u;_g  - 0.49u;_g  +0.74u:_,.  - 0.30u;._g 

9 

uj  = 1.49uJ_,  - 0.34u;_3  - 0.l7«;_g  +0.10ti;_4  - 0.04ti;_g  - 0.50u;_g  + 0.68u;_.,  - 0.09u;_g  - 0.14u;_g 

Figures  5.3  and  5.4  show  the  observed  /i-match  subsequences  in  the 
prediction  period  for  Zi  = 6 ~ 9,  respectively,  and  their  true  values  and  the  6-step 
ahead  predictions  by  regression  equations  and  AR  models.  With  Zi  = 6, . . . , 9,  the 
numbers  of  the  pattern  Zi-matched  subsequences  in  the  prediction  period  are  24,  19, 
15,  11,  respectively.  The  results  of  the  average  prediction  errors  by  the  regression 
equations  from  the  identified  patterns  and  by  the  compared  AR  models  are  shown 
in  Table  5.6. 


Sunspot  Sunspot 
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Figure  5.3:  Pattern  Zi-matched  subsequences  and  their  prediction  values  in  the 
prediction  period  of  smoothed  monthly  sunspot  time  series  when  li  = Q and  7. 


Sunspot  Sunspot 
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^ — six-step  ahead  predictions  by  regression  model 
A " six-step  ahead  predictions  by  optimal  AR  model  | 
X — six-step  ahead  predictions  by  AR(  Q model 
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Figure  5.4:  Pattern  Zi-matched  subsequences  and  their  prediction  values  in  the 
prediction  period  of  smoothed  monthly  sunspot  time  series  when  li  = 8 and  9. 
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Table  5.6:  Average  prediction  errors  for  the  next  sixth  point  in  the  prediction  pe- 
riod of  smoothed  monthly  sunspot  time  series  by  regression  equations  and  AR 
models  fitted  in  the  learning  period. 


k 

Regression 

AR{k) 

AR  — opt. 

6 

12.496 

18.109 

16.771 

7 

Vi+6- 

11.032 

23.002 

17.197 

8 

Vi+6- 

11.019 

21.240 

18.570 

9 

ViW- 

16.600 

21.139 

20.066 

As  we  can  see  in  Table  5.6,  for  the  observed  pattern  Zi-matched  subse- 
quences, the  predicted  sunspot  values  for  the  next  sixth  month  by  the  proposed 
method  are  all  significantly  better  than  the  ones  either  by  the  AR{li)  or  optimal 
AR  models.  In  addition,  the  prediction  errors  when  /i  = 7, 8 is  smaller  than  the 
other  values  of  lis,  which  is  corresponding  to  their  larger  test  statistic  values  in 
the  learning  period  (Table  4.5).  In  summary,  for  the  application  in  this  6-month 
smoothed  monthly  sunspot  time  series,  the  introduced  prediction  method  based  on 
the  PMSI  algorithm  performed  as  well  as  its  application  on  an  EEG  time  series. 

5.4  Summary  and  Discussion 

In  this  chapter,  a forecasting  method  based  on  the  results  of  the  PMSI 
algorithm  was  introduced.  This  method  first  applies  the  multivariate  regression 
analysis  which  regresses  the  next  I2  points  on  the  previous  li  points  for  the 
subsequences  with  the  most  predictable  patterns  identified  by  the  PMSI  algorithm 
in  the  learning  period  of  the  time  series.  These  regression  equations  axe  then 
applied  in  the  prediction  period  to  make  predictions  if  the  subsequences  are  pattern 
Zi-matched  with  the  identified  patterns.  The  forecasting  performance  of  this 
method  was  evaluated  with  respect  to  the  average  prediction  error  in  the  prediction 
period  and  was  compared  with  the  ones  by  the  generally  used  autoregressive  AR 


models. 
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The  application  on  an  EEG  time  series  in  Section  5.2  showed  that,  in  the 
prediction  period,  the  average  prediction  errors  are  significantly  smaller  than  the 
ones  by  the  AR  models  for  aJl  different  length  of  subsequences,  especially  for  the 
farther  points  predictions.  This  result  indicates  that  the  most  predictable  pattern 
of  subsequences  identified  by  the  PMSI  algorithm  in  the  learning  period  remains  its 
consistency  in  a near  future  prediction  period.  This  remained  consistency  gives  the 
advantage  of  making  future  predictions. 

In  the  second  application  on  the  smoothed  Wolf’s  monthly  sunspot  time 
series  also  showed  its  superiority  of  predictions  to  the  AR  models.  In  this  example, 
the  predictions  were  focused  on  the  values  of  the  next  sixth  month  because  of  the 
6-month  smoothing  window.  For  different  length  of  subsequences,  the  average 
prediction  errors  by  their  regression  equations  are  all  significantly  smaller  than  the 
predictions  made  by  AR  models. 

For  both  applications,  as  shown  in  Tables  5.3  and  5.6,  different  l\  value  gives 
different  prediction  performance  and  this  performance  corresponds  to  the  value 
of  test  statistic  when  applying  the  PMSI  algorithm  to  identify  the  pattern  in  the 
learning  period.  In  other  words,  the  patterns  that  axe  more  significantly  consistent 
in  the  learning  period  give  the  better  predictions  in  the  prediction  period. 

In  conclusion,  based  on  the  identified  most  predictable  patterns  in  the 
learning  period,  the  proposed  prediction  method  gives  significantly  better 
predictions  than  AR  models  for  predicting  the  subsequences  which  are  pattern 
/i-matched  with  the  identified  patterns.  As  we  know,  this  method  does  not  provide 
the  prediction  at  any  time  point  in  the  prediction  period.  However,  if  we  identify 
all  the  significantly  predictable  patterns  in  the  learning  period  and  estimate  their 
regression  equations  separately,  more  predictions  for  different  patterns  can  be  made 
in  the  future  prediction  period. 


CHAPTER  6 
CONCLUSION 


Traditional  time  series  analysis  tries  to  identify  the  best  model  to  fit  the 
time  series  data  in  a learning  period  and  applies  this  model  to  make  predictions 
for  the  future  series.  However,  in  many  cases,  the  time  series  only  contains  a few 
meaningful,  or  predictable  patterns  and  thus  the  models  which  are  fitted  in  the 
entire  time  series  may  not  help  in  understanding  its  characteristics.  Furthermore, 
the  forecasting  for  such  time  series  is  generally  focused  on  the  occurrences  some 
events  that  may  occur  only  after  a certain  pattern  of  signals  being  observed.  In 
these  cases,  the  traditional  time  series  model  such  as  autoregressive  (AR)  model 
usually  gives  poor  predictions  since  the  patterns  useful  for  prediction  may  occur 
only  in  a small  portion  of  the  entire  time  series.  This  research  provides  a possible 
solution  to  this  problem. 

In  this  dissertation,  the  concept  of  pattern  match  criterion  to  investigate 
the  regularity /complexity  of  a given  time  series  was  introduced.  Based  on  this 
criterion,  two  regularity  scores  were  proposed  and  were  compared  with  the  widely 
used  Approximate  Entropy  (ApEn)  which  was  mainly  based  on  the  value  match 
criterion.  The  main  difficulty  for  both  criterions  is  the  selections  of  the  two 
parameters,  length  of  the  subsequences  to  be  compared  and  the  filtering  critical 
value.  Without  further  information  of  the  given  time  series,  it  is  almost  impossible 
to  decide  what  would  be  the  good  choices  of  these  two  parameters.  By  a study 
on  the  well  known  MIX(p)  process,  it  was  found  that  the  two  proposed  regularity 
scores  based  on  the  pattern  match  criterion  performed  consistently  well  with 
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respect  to  the  change  of  the  parameters,  while  the  ApEn  values  showed  very 
inconsistent  results. 

A new  algorithm  (PMSI)  for  identifying  the  signals  with  the  most  pre- 
dictable pattern  in  a time  series  was  developed  based  on  the  concept  of  the  pattern 
match  criterion  and  the  statistical  hypothesis  tests.  For  the  reasonable  values 
of  embedding  probabilities  (pe),  the  pattern  identifiability  by  this  algorithm  was 
analytically  proved  for  the  stationary  Ai2(l)  and  random  walk  processes.  The 
numerical  solutions  for  the  minimum  required  embedding  probabilities  for  different 
underlying  processes  and  different  length  of  patterns  were  also  provided.  The 
algorithm  was  then  tested  by  the  simulated  time  series  with  different  embedded 
signals.  An  epileptic  electroencephalogram  (EEG)  time  series,  and  well  known 
Wolf’s  monthly  sunspot  time  series  were  used  as  applications  of  this  algorithm.  In 
the  simulation  studies  and  the  two  applications,  the  PMSI  algorithm  successfully 
identified  the  most  predictable  patterns  in  the  time  series  using  a significant 
statistical  test.  Furthermore,  the  identified  patterns  were  consistent  with  different 
length  of  subsequences  compared  in  the  algorithm,  which  indicates  the  robustness 
of  the  PMSI  algorithm  against  unknown  parameters. 

Based  on  the  identified  pattern  of  signals  by  the  PMSI  algorithm  in  a 
learning  period,  a forecasting  method  was  also  proposed.  Multivariate  regression 
analysis  was  applied  to  fit  the  regression  models  on  the  most  predictable  pattern 
matched  subsequences  in  the  learning  period  and  used  these  regression  equations 
to  make  predictions  in  the  prediction  period.  The  performance  of  this  method 
was  then  compared  with  the  ones  by  the  AR  models.  In  the  two  applications, 

EEG  time  series  and  Wolf’s  monthly  sunspot  time  series,  it  was  observed  that  the 
proposed  forecasting  method  gave  significantly  better  predictions  than  AR  models, 
especially  for  the  farther  predictions,  i.e.,  more  than  one  step  ahead  predictions. 
However,  this  propose  forecasting  method  does  not  provide  predictions  for  all  the 
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future  values,  it  only  make  predictions  when  the  identified  most  predictable  pattern 
is  observed.  In  some  cases,  there  may  be  multiple  categories  of  events  following  the 
different  patterns.  Our  further  research  is  to  extend  the  PMSI  algorithm  for  the 
multiple  pattern  identification. 
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